Re: Handle CONSTRUCTOR in operand_equal_p

2015-10-22 Thread Richard Biener
On Wed, 21 Oct 2015, Jan Hubicka wrote:

> Hi,
> this is updated patch I am going to commit.  As discussed, we also need to 
> match
> non-empty CONSTRUCTOR of vectors, but those should never be having CONSTANT 
> flags
> set, so they need care in the other path trhough operand_equal_p, so I will 
> post
> separate patch for that.
> 
> I updated the comment per Richard's comment and added a testcase.  Vuriously
> enoug we fold "val? (struct a){} : (struct a){}" only in PRE (tailmerging).
> W/o the patch even .optimized dump has the test and we fold it away only after
> lowering to RTL:
> ret (int val)
> {
>   struct a D.1764;
> 
>   :
>   if (val_2(D) != 0)
> goto ;
>   else
> goto ;
> 
>   :
>   D.1764 = {};
>   goto ;
> 
>   :
>   D.1764 = {};
> 
>   :
>   return D.1764;
> 
> }
> 
> ret:
> .LFB0:
> .cfi_startproc
> xorl%eax, %eax
> ret
> .cfi_endproc
> 
> With the patch we get:
> 
> ret (int val)
> {
>   struct a D.1764;
> 
>   :
>   D.1764 = {};
>   return D.1764;
> 
> }

But only via GENERIC folding I suppose.  Yes, we don't value-number
aggregates and generally PRE (and DOM via excessive jump-threading)
is the only pass that remotely handles this kind of situation.

code hoisting/sinking would maybe catch this but as this involves
memory I'm not sure the implementation ontop of PRE that is stuck
in some PR would catch it.

Richard.

> Honza
> 
>   * fold-const.c (operand_equal_p): Add code matching empty
>   constructors.
>   * gcc.dg/tree-ssa/operand-equal-1.c: Verify that empty constructors
>   are matched.
> 
> Index: fold-const.c
> ===
> --- fold-const.c  (revision 229133)
> +++ fold-const.c  (working copy)
> @@ -2892,6 +2892,11 @@ operand_equal_p (const_tree arg0, const_
>   return operand_equal_p (TREE_OPERAND (arg0, 0), TREE_OPERAND (arg1, 0),
>   flags | OEP_ADDRESS_OF
>   | OEP_CONSTANT_ADDRESS_OF);
> +  case CONSTRUCTOR:
> + /* In GIMPLE empty constructors are allowed in initializers of
> +aggregates.  */
> + return (!vec_safe_length (CONSTRUCTOR_ELTS (arg0))
> + && !vec_safe_length (CONSTRUCTOR_ELTS (arg1)));
>default:
>   break;
>}
> Index: testsuite/gcc.dg/tree-ssa/operand-equal-1.c
> ===
> --- testsuite/gcc.dg/tree-ssa/operand-equal-1.c   (revision 0)
> +++ testsuite/gcc.dg/tree-ssa/operand-equal-1.c   (revision 0)
> @@ -0,0 +1,8 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-pre" } */
> +struct a {int a,b;};
> +struct a ret(int val)
> +{
> +   return val? (struct a){} : (struct a){};
> +}
> +/* { dg-final { scan-tree-dump-not "if " "pre"} } */
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: Add non-constant vector ctors to operand_equal_p

2015-10-22 Thread Richard Biener
On Thu, 22 Oct 2015, Jan Hubicka wrote:

> Hi,
> this patch adds matching of non-constant CONSTRUCTOR expressions into
> operand_equal_p. As discussed with Richard, those can happen when we are
> building vectors out of components.  I also added a testcase that triggers 
> this
> path and gets folded bit earlier (tree-ssa fold it in tree-pre eventually even
> at mainline). Again, this will become more relevant for ipa-icf.
> The code has few positives in testsuite (mostly in autovec), none in 
> bootstrap.
> 
> The code can't handle incomplete ctors that match complete:
> 
> typedef char __attribute__ ((vector_size (4))) v4qi;
> v4qi v;
> void ret(char a)
> {
>   v4qi c={a,a,0,0},d={a,a};
>   v = (c!=d);
> }
> 
> gets compiled as (optimized dump):
> 
> ;; Function ret (ret, funcdef_no=0, decl_uid=1758, cgraph_uid=0, 
> symbol_order=1)
> 
> ret (char a)
> {
>   v4qi d;
>   v4qi c;
>   vector(4) signed char _4;
>   char _7;
>   char _8;
>   signed char _9;
>   char _10;
>   char _11;
>   signed char _12;
>   char _13;
>   char _14;
>   signed char _15;
>   char _16;
>   char _17;
>   signed char _18;
> 
>   :
>   c_2 = {a_1(D), a_1(D), 0, 0};
>   d_3 = {a_1(D), a_1(D)};
>   _7 = a_1(D);
>   _8 = a_1(D);
>   _9 = 0;
>   _10 = a_1(D);
>   _11 = a_1(D);
>   _12 = 0;
>   _13 = 0;
>   _14 = 0;
>   _15 = 0;
>   _16 = 0;
>   _17 = 0;
>   _18 = 0;
>   _4 = {_9, _12, _15, _18};
>   v = _4;
>   return;
> }
> 
> This is produced by forwprop4 by:
> 
>   :
>   c_2 = {a_1(D), a_1(D), 0, 0};
>   d_3 = {a_1(D), a_1(D)};
>   _7 = BIT_FIELD_REF ;
>   _8 = BIT_FIELD_REF ;
>   _9 = _7 != _8 ? -1 : 0;
>   _10 = BIT_FIELD_REF ;
>   _11 = BIT_FIELD_REF ;
>   _12 = _10 != _11 ? -1 : 0;
>   _13 = BIT_FIELD_REF ;
>   _14 = BIT_FIELD_REF ;
>   _15 = _13 != _14 ? -1 : 0;
>   _16 = BIT_FIELD_REF ;
>   _17 = BIT_FIELD_REF ;
>   _18 = _16 != _17 ? -1 : 0;
>   _4 = {_9, _12, _15, _18};
> 
> which gets done by veclower (so PRE misses its change to clean up the
> redundancies) from:
> 
>   :
>   c_2 = {a_1(D), a_1(D), 0, 0};
>   d_3 = {a_1(D), a_1(D)};
>   _4 = VEC_COND_EXPR ;
> 
> Which clearly shows we have more than one issue here.  I would rather
> canonicalize the ctors and strenghten definition of gimple to require them to
> be always full.

Agreed.  I think this requires changes in the FEs though.

>  I suppose we want to eitehr cleanup after veclower or make
> it bit smarter about duplications.
> 
> Bootstrapped/regtested x86_64-linux. OK?

Ok.

Thanks,
Richard.

>   * fold-const.c (operand_equal_p): Handle matching of vector
>   constructors.
>   * gcc.dg/tree-ssa/operand-equal-2.c: New testcase.
> 
> Index: testsuite/gcc.dg/tree-ssa/operand-equal-2.c
> ===
> --- testsuite/gcc.dg/tree-ssa/operand-equal-2.c   (revision 0)
> +++ testsuite/gcc.dg/tree-ssa/operand-equal-2.c   (revision 0)
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-forwprop1" } */
> +
> +typedef char __attribute__ ((vector_size (4))) v4qi;
> +
> +v4qi v;
> +void ret(char a)
> +{
> +  v4qi c={a,a,a,a},d={a,a,a,a};
> +  v = (c!=d);
> +}
> +/* { dg-final { scan-tree-dump "v = . 0, 0, 0, 0 ." "forwprop1"} } */
> Index: fold-const.c
> ===
> --- fold-const.c  (revision 229153)
> +++ fold-const.c  (working copy)
> @@ -2809,11 +2809,12 @@ operand_equal_p (const_tree arg0, const_
>   return 0;
>  }
>  
> -  /* This is needed for conversions and for COMPONENT_REF.
> - Might as well play it safe and always test this.  */
> +  /* When not checking adddresses, this is needed for conversions and for
> + COMPONENT_REF.  Might as well play it safe and always test this.  */
>if (TREE_CODE (TREE_TYPE (arg0)) == ERROR_MARK
>|| TREE_CODE (TREE_TYPE (arg1)) == ERROR_MARK
> -  || TYPE_MODE (TREE_TYPE (arg0)) != TYPE_MODE (TREE_TYPE (arg1)))
> +  || (TYPE_MODE (TREE_TYPE (arg0)) != TYPE_MODE (TREE_TYPE (arg1))
> +   && !(flags & OEP_ADDRESS_OF)))
>  return 0;
>  
>/* If ARG0 and ARG1 are the same SAVE_EXPR, they are necessarily equal.
> @@ -3149,6 +3150,56 @@ operand_equal_p (const_tree arg0, const_
> && DECL_BUILT_IN_CLASS (arg0) == DECL_BUILT_IN_CLASS (arg1)
> && DECL_FUNCTION_CODE (arg0) == DECL_FUNCTION_CODE (arg1));
>  
> +case tcc_exceptional:
> +  if (TREE_CODE (arg0) == CONSTRUCTOR)
> + {
> +   /* In GIMPLE constructors are used only to build vectors from
> +  elements.  Individual elements in the constructor must be
> +  indexed in increasing order and form an initial sequence.
> +
> +  We make no effort to compare constructors in generic.
> +  (see sem_variable::equals in ipa-icf which can do so for
> +   constants).  */
> +   if (!VECTOR_TYPE_P (TREE_TYPE (arg0))
> +   || !VECTOR_TYPE_P (TREE_TYPE (arg1)))
> + return 0;
> +
> +   /* Be sure t

Re: [PATCH v2 10/13] Avoid CSE of MEMs in different address spaces

2015-10-22 Thread Richard Biener
On Wed, Oct 21, 2015 at 7:12 PM, Richard Henderson  wrote:
> On 10/21/2015 03:37 AM, Jeff Law wrote:
>>
>> On 10/20/2015 03:27 PM, Richard Henderson wrote:
>>>
>>> ---
>>>   gcc/cselib.c | 22
>>> +-
>>>   gcc/fold-const.c | 14 +++---
>>>   gcc/testsuite/gcc.target/i386/addr-space-2.c | 11 +++
>>>   3 files changed, 31 insertions(+), 16 deletions(-)
>>>   create mode 100644 gcc/testsuite/gcc.target/i386/addr-space-2.c
>>
>> This part is fine.  Does the one testcase cover both the cselib.c &
>> fold-const.c fix?  If not, it'd be good to cover both if we can.
>
>
> Yes it does.
>
> The fold-const fix gets us as far as .optimized, but then the cselib fix
> gets us past postreload-cse.

These fixes are appropriate for backporting as well I think.

Richard.
>
> r~


Re: [OpenACC 1/11] UNIQUE internal function

2015-10-22 Thread Richard Biener
On Wed, Oct 21, 2015 at 9:00 PM, Nathan Sidwell  wrote:
> This patch implements a new internal function that has a 'uniqueness'
> property.   Jump-threading cannot clone it and tail-merging cannot combine
> multiple instances.
>
> The uniqueness is implemented by a new gimple fn,
> gimple_call_internal_unique_p.  Routines that check for identical or
> cloneable calls are augmented to check this property.  These are:
>
> * tree-ssa-threadedge, which is figuring out if jump threading is a win.
> Jump threading is inhibited.
>
> * gimple_call_same_target_p, used for tail merging and similar transforms.
> Two calls of IFN_UNIQUE will never be  the same target.
>
> * tracer.c, which is determining whether to clone a region.
>
> Interestingly jump threading avoids cloning volatile asms (which it admits
> is conservatively safe), but the tracer does not. I wonder if there's a
> latent problem in tracer?
>
> The reason I needed a function with this property is to  preserve the
> looping structure of a function's CFG.  As mentioned in the intro, we mark
> up loops (using this builtin), so the example I gave has the following
> inserts:
>
> #pragma acc parallel ...
> {
>  // single mode here
> #pragma acc loop ...
> IFN_UNIQUE (FORKING  ...)
> for (i = 0; i < N; i++) // loop 1
>   ... // partitioned mode here
> IFN_UNIQUE (JOINING ...)
>
> if (expr) // single mode here
> #pragma acc loop ...
>   IFN_UNIQUE (FORKING ...)
>   for (i = 0; i < N; i++) // loop 2
> ... // partitioned mode here
>   IFN_UNIQUE (JOINING ...)
> }
>
> The properly nested loop property of the CFG is preserved through the
> compilation.  This is important as (a) it allows later passes to reconstruct
> this looping structure and (b) hardware constraints require a partioned
> region end for all partitioned threads at a single instruction.
>
> Until I added this unique property, original bring-up  of partitioned
> execution would hit cases of split loops ending in multiple cloned JOINING
> markers and similar cases.
>
> To distinguish different uses of the UNIQUE function, I use the first
> argument, which is expected to be an INTEGER_CST.  I figured this better
> than using multiple new internal fns, all with the unique property, as the
> latter would need (at least) a range check in gimple_call_internal_unique_p
> rather than a simple equality.
>
> Jakub, IYR I originally had IFN_FORK and IFN_JOIN as such distinct internal
> fns.  This replaces that scheme.
>
> ok?

Hmm, I'd just have used gimple_has_volatile_ops on the call?  That
should have the
desired effects.

Richard.

> nathan


Re: [OpenACC 1/11] UNIQUE internal function

2015-10-22 Thread Richard Biener
On Thu, Oct 22, 2015 at 9:48 AM, Richard Biener
 wrote:
> On Wed, Oct 21, 2015 at 9:00 PM, Nathan Sidwell  wrote:
>> This patch implements a new internal function that has a 'uniqueness'
>> property.   Jump-threading cannot clone it and tail-merging cannot combine
>> multiple instances.
>>
>> The uniqueness is implemented by a new gimple fn,
>> gimple_call_internal_unique_p.  Routines that check for identical or
>> cloneable calls are augmented to check this property.  These are:
>>
>> * tree-ssa-threadedge, which is figuring out if jump threading is a win.
>> Jump threading is inhibited.
>>
>> * gimple_call_same_target_p, used for tail merging and similar transforms.
>> Two calls of IFN_UNIQUE will never be  the same target.
>>
>> * tracer.c, which is determining whether to clone a region.
>>
>> Interestingly jump threading avoids cloning volatile asms (which it admits
>> is conservatively safe), but the tracer does not. I wonder if there's a
>> latent problem in tracer?
>>
>> The reason I needed a function with this property is to  preserve the
>> looping structure of a function's CFG.  As mentioned in the intro, we mark
>> up loops (using this builtin), so the example I gave has the following
>> inserts:
>>
>> #pragma acc parallel ...
>> {
>>  // single mode here
>> #pragma acc loop ...
>> IFN_UNIQUE (FORKING  ...)
>> for (i = 0; i < N; i++) // loop 1
>>   ... // partitioned mode here
>> IFN_UNIQUE (JOINING ...)
>>
>> if (expr) // single mode here
>> #pragma acc loop ...
>>   IFN_UNIQUE (FORKING ...)
>>   for (i = 0; i < N; i++) // loop 2
>> ... // partitioned mode here
>>   IFN_UNIQUE (JOINING ...)
>> }
>>
>> The properly nested loop property of the CFG is preserved through the
>> compilation.  This is important as (a) it allows later passes to reconstruct
>> this looping structure and (b) hardware constraints require a partioned
>> region end for all partitioned threads at a single instruction.
>>
>> Until I added this unique property, original bring-up  of partitioned
>> execution would hit cases of split loops ending in multiple cloned JOINING
>> markers and similar cases.
>>
>> To distinguish different uses of the UNIQUE function, I use the first
>> argument, which is expected to be an INTEGER_CST.  I figured this better
>> than using multiple new internal fns, all with the unique property, as the
>> latter would need (at least) a range check in gimple_call_internal_unique_p
>> rather than a simple equality.
>>
>> Jakub, IYR I originally had IFN_FORK and IFN_JOIN as such distinct internal
>> fns.  This replaces that scheme.
>>
>> ok?
>
> Hmm, I'd just have used gimple_has_volatile_ops on the call?  That
> should have the
> desired effects.

That is, whatever new IFNs you need are ok, but special-casing them is not
necessary if you properly mark the calls as volatile.

Richard.

> Richard.
>
>> nathan


Re: [PATCH 1/9] ENABLE_CHECKING refactoring

2015-10-22 Thread Richard Biener
On Wed, Oct 21, 2015 at 6:27 PM, Jakub Jelinek  wrote:
> On Wed, Oct 21, 2015 at 06:22:37PM +0200, Bernd Schmidt wrote:
>> On 10/21/2015 06:18 PM, Jeff Law wrote:
>> >To avoid conditionally compiled code.  I'm of the opinion we should be
>> >stomping out as much as we reasonably can.
>>
>> Yeah, I get that, but the point I'm trying to make is that if you get rid of
>> all conditional compilation, you don't need either ENABLE_CHECKING or
>> CHECKING_P, because you'll be testing flag_checking. And if some conditional
>> compilation remains (let's say in the cumulative_args case), then there's no
>> win from changing the spelling from ENABLE_CHECKING to CHECKING_P.
>
> So, what is the plan with flag_checking?  Will --enable-checking=yes vs.
> release just affect the default value of that flag, or will it be turned
> into 0 for release builds?  If the former, I'd at least like to see
> a __builtin_expect making it unlikely for the release builds and likely for
> the development builds (to match the default value).

My plan is to make flag_checking a runtine controllable value (via -fchecking)
so if you run into an issue with a release build you can still do -fchecking
with the same build.  We can't (and shouldn't) guard everything with
flag_checking
because the hit on compile-time with flag_checking == 0 will be too
big (like the
tree checking macros).  But for example all the verify_XXX () calls
can be safely
guarded by flag_checking.

So we _do_ want a always-defined ENABLE_CHECKING (or CHECKING_P) to
be able to remove conditional compilation for cases where we do not want to
do runtime checks for performance reasons.

Richard.

> Jakub


Re: [PATCH v2 08/13] Add TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID

2015-10-22 Thread Richard Biener
On Thu, Oct 22, 2015 at 5:12 AM, Sandra Loosemore
 wrote:
> On 10/20/2015 03:27 PM, Richard Henderson wrote:
>>
>>
>> +@deftypefn {Target Hook} bool TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID
>> (addr_space_t @var{as})
>> +Define this to modify the default handling of address 0 for the
>> +address space.  Return true if 0 should be considered a valid address.
>> +@end deftypefn
>> +
>
>
> I'm confused by this new hook.  How does it interact with
> -fdelete-null-pointer-checks?  E.g. nios2-elf defaults
> flag_delete_null_pointer_checks to 0 precisely because address 0 is
> legitimate on that target.  The avr and cr16 backends simply override
> flag_delete_null_pointer_checks.  Do backends that already frob one thing
> need to frob the other as well?  Are there any changes to the user
> documentation for -fdelete-null-pointer-checks required?

I suppose a cleanup possibility would be to get rid of that "abuse" of
flag_delete_null_pointer_checks and make all targets now defining that to zero
instead implement the hook above but for the default address-space.
And then make all places checking for flag_delete_null_pointer_checks (or
at least some) use the target hook instead.

Richard.

> -Sandra
>


Re: [OpenACC 1/11] UNIQUE internal function

2015-10-22 Thread Jakub Jelinek
On Thu, Oct 22, 2015 at 09:49:29AM +0200, Richard Biener wrote:
> >> Jakub, IYR I originally had IFN_FORK and IFN_JOIN as such distinct internal
> >> fns.  This replaces that scheme.
> >>
> >> ok?
> >
> > Hmm, I'd just have used gimple_has_volatile_ops on the call?  That
> > should have the
> > desired effects.
> 
> That is, whatever new IFNs you need are ok, but special-casing them is not
> necessary if you properly mark the calls as volatile.

I don't see gimple_has_volatile_ops used in tracer.c or
tree-ssa-threadedge.c.  Setting gimple_has_volatile_ops on those IFNs is
fine, but I think they are even stronger than that.

Jakub


Re: [OpenACC 1/11] UNIQUE internal function

2015-10-22 Thread Jakub Jelinek
On Wed, Oct 21, 2015 at 03:00:47PM -0400, Nathan Sidwell wrote:
> To distinguish different uses of the UNIQUE function, I use the first
> argument, which is expected to be an INTEGER_CST.  I figured this better
> than using multiple new internal fns, all with the unique property, as the
> latter would need (at least) a range check in gimple_call_internal_unique_p
> rather than a simple equality.
> 
> Jakub, IYR I originally had IFN_FORK and IFN_JOIN as such distinct internal
> fns.  This replaces that scheme.
> 
> ok?
> 
> nathan

> 2015-10-20  Nathan Sidwell  
>   Cesar Philippidis  
>   
>   * internal-fn.c (expand_UNIQUE): New.
>   * internal-fn.def (IFN_UNIQUE): New.
>   (IFN_UNIQUE_UNSPEC): Define.
>   * gimple.h (gimple_call_internal_unique_p): New.
>   * gimple.c (gimple_call_same_target_p): Check internal fn
>   uniqueness.
>   * tracer.c (ignore_bb_p): Check for IFN_UNIQUE call.
>   * tree-ssa-threadedge.c
>   (record_temporary_equivalences_from_stmts): Likewise.

This is generally fine with me, but please work with Richi to find
something acceptable to him too.

> +DEF_INTERNAL_FN (UNIQUE, ECF_NOTHROW | ECF_LEAF, NULL)

Are you sure about the ECF_LEAF?  I mean, while the function can't
call back to your code, I'd expect you want it as kind of strong
optimization barrier too.

> +#define IFN_UNIQUE_UNSPEC 0  /* Undifferentiated UNIQUE.  */
> Index: tracer.c
> ===
> --- tracer.c  (revision 229096)
> +++ tracer.c  (working copy)
> @@ -93,6 +93,7 @@ bb_seen_p (basic_block bb)
>  static bool
>  ignore_bb_p (const_basic_block bb)
>  {
> +  gimple_stmt_iterator gsi;
>gimple *g;
>  
>if (bb->index < NUM_FIXED_BLOCKS)
> @@ -106,6 +107,17 @@ ignore_bb_p (const_basic_block bb)
>if (g && gimple_code (g) == GIMPLE_TRANSACTION)
>  return true;
>  
> +  /* Ignore blocks containing non-clonable function calls.  */
> +  for (gsi = gsi_start_bb (CONST_CAST_BB (bb));
> +   !gsi_end_p (gsi); gsi_next (&gsi))
> +{
> +  g = gsi_stmt (gsi);
> +
> +  if (is_gimple_call (g) && gimple_call_internal_p (g)
> +   && gimple_call_internal_unique_p (as_a  (g)))
> + return true;
> +}

Do you have to scan the whole bb?  E.g. don't or should not those
unique IFNs force end of bb?

Jakub


Re: [OpenACC 1/11] UNIQUE internal function

2015-10-22 Thread Richard Biener
On Thu, Oct 22, 2015 at 9:59 AM, Jakub Jelinek  wrote:
> On Thu, Oct 22, 2015 at 09:49:29AM +0200, Richard Biener wrote:
>> >> Jakub, IYR I originally had IFN_FORK and IFN_JOIN as such distinct 
>> >> internal
>> >> fns.  This replaces that scheme.
>> >>
>> >> ok?
>> >
>> > Hmm, I'd just have used gimple_has_volatile_ops on the call?  That
>> > should have the
>> > desired effects.
>>
>> That is, whatever new IFNs you need are ok, but special-casing them is not
>> necessary if you properly mark the calls as volatile.
>
> I don't see gimple_has_volatile_ops used in tracer.c or
> tree-ssa-threadedge.c.  Setting gimple_has_volatile_ops on those IFNs is
> fine, but I think they are even stronger than that.

Hmm, indeed.  Now I fail to see how the implemented property "preserves
the CFG looping structure".  And I would have expected can_copy_bbs_p
to be adjusted instead (catching more cases and the threading and tracer
case as well).

As far as I can see nothing would prevent dissolving the loop by completely
unolling it for example.  Or deleting it because it has no side-effects.

So you'd need to be more precise as to what properties you are trying to
preserve by placing a single stmt somewhere.

Richard.

> Jakub


Re: [OpenACC 1/11] UNIQUE internal function

2015-10-22 Thread Richard Biener
On Thu, Oct 22, 2015 at 10:04 AM, Jakub Jelinek  wrote:
> On Wed, Oct 21, 2015 at 03:00:47PM -0400, Nathan Sidwell wrote:
>> To distinguish different uses of the UNIQUE function, I use the first
>> argument, which is expected to be an INTEGER_CST.  I figured this better
>> than using multiple new internal fns, all with the unique property, as the
>> latter would need (at least) a range check in gimple_call_internal_unique_p
>> rather than a simple equality.
>>
>> Jakub, IYR I originally had IFN_FORK and IFN_JOIN as such distinct internal
>> fns.  This replaces that scheme.
>>
>> ok?
>>
>> nathan
>
>> 2015-10-20  Nathan Sidwell  
>>   Cesar Philippidis  
>>
>>   * internal-fn.c (expand_UNIQUE): New.
>>   * internal-fn.def (IFN_UNIQUE): New.
>>   (IFN_UNIQUE_UNSPEC): Define.
>>   * gimple.h (gimple_call_internal_unique_p): New.
>>   * gimple.c (gimple_call_same_target_p): Check internal fn
>>   uniqueness.
>>   * tracer.c (ignore_bb_p): Check for IFN_UNIQUE call.
>>   * tree-ssa-threadedge.c
>>   (record_temporary_equivalences_from_stmts): Likewise.
>
> This is generally fine with me, but please work with Richi to find
> something acceptable to him too.
>
>> +DEF_INTERNAL_FN (UNIQUE, ECF_NOTHROW | ECF_LEAF, NULL)
>
> Are you sure about the ECF_LEAF?  I mean, while the function can't
> call back to your code, I'd expect you want it as kind of strong
> optimization barrier too.
>
>> +#define IFN_UNIQUE_UNSPEC 0  /* Undifferentiated UNIQUE.  */
>> Index: tracer.c
>> ===
>> --- tracer.c  (revision 229096)
>> +++ tracer.c  (working copy)
>> @@ -93,6 +93,7 @@ bb_seen_p (basic_block bb)
>>  static bool
>>  ignore_bb_p (const_basic_block bb)
>>  {
>> +  gimple_stmt_iterator gsi;
>>gimple *g;
>>
>>if (bb->index < NUM_FIXED_BLOCKS)
>> @@ -106,6 +107,17 @@ ignore_bb_p (const_basic_block bb)
>>if (g && gimple_code (g) == GIMPLE_TRANSACTION)
>>  return true;
>>
>> +  /* Ignore blocks containing non-clonable function calls.  */
>> +  for (gsi = gsi_start_bb (CONST_CAST_BB (bb));
>> +   !gsi_end_p (gsi); gsi_next (&gsi))
>> +{
>> +  g = gsi_stmt (gsi);
>> +
>> +  if (is_gimple_call (g) && gimple_call_internal_p (g)
>> +   && gimple_call_internal_unique_p (as_a  (g)))
>> + return true;
>> +}
>
> Do you have to scan the whole bb?  E.g. don't or should not those
> unique IFNs force end of bb?

Yeah, please make them either end or start a BB so we have to check
at most a single stmt.  ECF_RETURNS_TWICE should achieve that,
it also makes it a code motion barrier.

Richard.

> Jakub


Re: [OpenACC 2/11] PTX backend changes

2015-10-22 Thread Jakub Jelinek
On Wed, Oct 21, 2015 at 03:09:55PM -0400, Nathan Sidwell wrote:
> Bernd, any comments?

Just a few questions, otherwise it is a PTX territory you PTX maintainers
should review.

>   (*oacc_ntid_insn,  oacc_ntid, *oacc_tid_insn, oacc_tid): Delete.

Extra space.

> +/* Size of buffer needed to broadcast across workers.  This is used
> +   for both worker-neutering and worker broadcasting.   It is shared
> +   by all functions emitted.  The buffer is placed in shared memory.
> +   It'd be nice if PTX supported common blocks, because then this
> +   could be shared across TUs (taking the largest size).  */
> +static unsigned worker_bcast_hwm;

As discussed in another thread for another patch, is hwm the best acronym
here?  If it is the size, then why not worker_bcast_size?

> @@ -2129,6 +3242,19 @@ nvptx_file_end (void)
>FOR_EACH_HASH_TABLE_ELEMENT (*needed_fndecls_htab, decl, tree, iter)
>  nvptx_record_fndecl (decl, true);
>fputs (func_decls.str().c_str(), asm_out_file);
> +
> +  if (worker_bcast_hwm)
> +{
> +  /* Define the broadcast buffer.  */
> +
> +  worker_bcast_hwm = (worker_bcast_hwm + worker_bcast_align - 1)
> + & ~(worker_bcast_align - 1);
> +  
> +  fprintf (asm_out_file, "// BEGIN VAR DEF: %s\n", worker_bcast_name);
> +  fprintf (asm_out_file, ".shared .align %d .u8 %s[%d];\n",
> +worker_bcast_align,
> +worker_bcast_name, worker_bcast_hwm);
> +}

So, is the worker broadcast buffer effectively a file scope .shared
variable?  My worry is that as .shared is quite limited resource, if you
compile many TUs and each allocates its own broadcast buffer you run out of
shared memory.  Is there any way how to share the broadcast buffers in
between different TUs (other than LTO)?

Jakub


Re: [OpenACC 3/11] new target hook

2015-10-22 Thread Jakub Jelinek
On Wed, Oct 21, 2015 at 03:13:26PM -0400, Nathan Sidwell wrote:
> 2015-10-20  Nathan Sidwell  
> 
>   * target.def (fork_join): New GOACC hook.
>   * targhooks.h (default_goacc_fork_join): Declare.
>   * omp-low.c (default_goacc_forkjoin): New.
>   * doc/tm.texi.in (TARGET_GOACC_FORK_JOIN): Add.
>   * doc/tm.texi: Regenerate.
>   * config/nvptx/nvptx.c (nvptx_xform_fork_join): New.
>   (TARGET_GOACC_FOR_JOIN): Override.

This is ok, with nits.

> --- config/nvptx/nvptx.c  (revision 229096)
> +++ config/nvptx/nvptx.c  (working copy)
> @@ -2146,7 +2146,26 @@ nvptx_goacc_validate_dims (tree ARG_UNUS
>  
>return changed;
>  }
> -
> +
> +/* Determine whether fork & joins are needed.  */
> +
> +static bool
> +nvptx_xform_fork_join (gcall *call, const int dims[],
> +bool ARG_UNUSED (is_fork))

Why is this not called nvptx_goacc_fork_join when that is the name of
the target hook?

Jakub


Re: [OpenACC 4/11] C FE changes

2015-10-22 Thread Jakub Jelinek
On Wed, Oct 21, 2015 at 03:16:20PM -0400, Nathan Sidwell wrote:
> 2015-10-20  Cesar Philippidis  
>   Thomas Schwinge  
>   James Norris  
>   Joseph Myers  
>   Julian Brown  
> 
>   * c-parser.c (c_parser_oacc_shape_clause): New.
>   (c_parser_oacc_simple_clause): New.
>   (c_parser_oacc_all_clauses): Add auto, gang, seq, vector, worker.
>   (OACC_LOOP_CLAUSE_MASK): Add gang, worker, vector, auto, seq.

Ok, with one nit.

>  /* OpenACC:
> +   gang [( gang_expr_list )]
> +   worker [( expression )]
> +   vector [( expression )] */
> +
> +static tree
> +c_parser_oacc_shape_clause (c_parser *parser, pragma_omp_clause c_kind,
> + const char *str, tree list)

I think it would be better to remove the c_kind argument and pass to this
function omp_clause_code kind instead.  The callers are already in a big
switch, with a separate call for each of the clauses.
After all, e.g. for c_parser_oacc_simple_clause you already do it that way
too.

> +{
> +  omp_clause_code kind;
> +  const char *id = "num";
> +
> +  switch (c_kind)
> +{
> +default:
> +  gcc_unreachable ();
> +case PRAGMA_OACC_CLAUSE_GANG:
> +  kind = OMP_CLAUSE_GANG;
> +  break;
> +case PRAGMA_OACC_CLAUSE_VECTOR:
> +  kind = OMP_CLAUSE_VECTOR;
> +  id = "length";
> +  break;
> +case PRAGMA_OACC_CLAUSE_WORKER:
> +  kind = OMP_CLAUSE_WORKER;
> +  break;
> +}

Then you can replace this switch with just if (kind == OMP_CLAUSE_VECTOR)
id = "length";

Jakub


[Committed] S/390: PR68015 Fix ICE in s390_emit_compare

2015-10-22 Thread Andreas Krebbel
Committed to head and GCC 5 branch

gcc/ChangeLog:

2015-10-22  Andreas Krebbel  

PR target/68015
* config/s390/s390.md (movcc): Emit compare only if we don't
already have a comparison result.

gcc/testsuite/ChangeLog:

2015-10-22  Andreas Krebbel  

PR target/68015
* gcc.target/s390/pr68015.c: New test.

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 1822459..ea65c74 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -6108,8 +6108,13 @@
  (match_operand:GPR 2 "nonimmediate_operand" "")
  (match_operand:GPR 3 "nonimmediate_operand" "")))]
   "TARGET_Z196"
-  "operands[1] = s390_emit_compare (GET_CODE (operands[1]),
-XEXP (operands[1], 0), XEXP (operands[1], 
1));")
+{
+  /* Emit the comparison insn in case we do not already have a comparison 
result.  */
+  if (!s390_comparison (operands[1], VOIDmode))
+operands[1] = s390_emit_compare (GET_CODE (operands[1]),
+XEXP (operands[1], 0),
+XEXP (operands[1], 1));
+})
 
 ; locr, loc, stoc, locgr, locg, stocg
 (define_insn_and_split "*movcc"
diff --git a/gcc/testsuite/gcc.target/s390/pr68015.c 
b/gcc/testsuite/gcc.target/s390/pr68015.c
new file mode 100644
index 000..b0d1f35
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/pr68015.c
@@ -0,0 +1,24 @@
+/* { dg-compile } */
+/* { dg-options "-O2 -march=z196" } */
+
+extern long useme (long, ...);
+
+void
+foo (void)
+{
+  long secs = useme (41);
+  long utc_secs = useme (42);
+  long h, m;
+
+  utc_secs = useme (42);
+  h = secs / 3600;
+  m = secs / 60;
+  if (utc_secs >= 86400)
+{
+  m = 59;
+  h--;
+  if (h < 0)
+   h = 23;
+}
+  useme (h, m);
+}



[PATCH] Move MINUS of pointer-plus expr foldings to match.pd

2015-10-22 Thread Richard Biener

Bootstrapped and tested on x86_64-unknown-linux-gnu, the OMP changes
were approved by Jakub on IRC.

Richard.

2015-10-22  Richard Biener  

* fold-const.c (fold_addr_of_array_ref_difference): Properly
convert operands before folding a MINUS_EXPR.
(fold_binary_loc): Move simplification of MINUS_EXPR on
converted POINTER_PLUS_EXPRs ...
* match.pd: ... here.

c/
* c-typeck.c (c_finish_omp_clauses): Properly convert operands
before folding a MINUS_EXPR.

cp/
* semantics.c (cp_finish_omp_clause_depend_sink): Properly convert
before folding a MINUS_EXPR.
(finish_omp_clauses): Likewise.

Index: gcc/fold-const.c
===
*** gcc/fold-const.c(revision 229119)
--- gcc/fold-const.c(working copy)
*** fold_addr_of_array_ref_difference (locat
*** 8846,8854 
   = fold_addr_of_array_ref_difference (loc, type, base0, base1)))
|| (INDIRECT_REF_P (base0)
  && INDIRECT_REF_P (base1)
! && (base_offset = fold_binary_loc (loc, MINUS_EXPR, type,
!TREE_OPERAND (base0, 0),
!TREE_OPERAND (base1, 0
|| operand_equal_p (base0, base1, OEP_ADDRESS_OF))
  {
tree op0 = fold_convert_loc (loc, type, TREE_OPERAND (aref0, 1));
--- 8864,8874 
   = fold_addr_of_array_ref_difference (loc, type, base0, base1)))
|| (INDIRECT_REF_P (base0)
  && INDIRECT_REF_P (base1)
! && (base_offset
!   = fold_binary_loc (loc, MINUS_EXPR, type,
!  fold_convert (type, TREE_OPERAND (base0, 0)),
!  fold_convert (type,
!TREE_OPERAND (base1, 0)
|| operand_equal_p (base0, base1, OEP_ADDRESS_OF))
  {
tree op0 = fold_convert_loc (loc, type, TREE_OPERAND (aref0, 1));
*** fold_binary_loc (location_t loc,
*** 9642,9689 
return NULL_TREE;
  
  case MINUS_EXPR:
-   /* Pointer simplifications for subtraction, simple reassociations. */
-   if (POINTER_TYPE_P (TREE_TYPE (arg1)) && POINTER_TYPE_P (TREE_TYPE 
(arg0)))
-   {
- /* (PTR0 p+ A) - (PTR1 p+ B) -> (PTR0 - PTR1) + (A - B) */
- if (TREE_CODE (arg0) == POINTER_PLUS_EXPR
- && TREE_CODE (arg1) == POINTER_PLUS_EXPR)
-   {
- tree arg00 = fold_convert_loc (loc, type, TREE_OPERAND (arg0, 0));
- tree arg01 = fold_convert_loc (loc, type, TREE_OPERAND (arg0, 1));
- tree arg10 = fold_convert_loc (loc, type, TREE_OPERAND (arg1, 0));
- tree arg11 = fold_convert_loc (loc, type, TREE_OPERAND (arg1, 1));
- return fold_build2_loc (loc, PLUS_EXPR, type,
- fold_build2_loc (loc, MINUS_EXPR, type,
-  arg00, arg10),
- fold_build2_loc (loc, MINUS_EXPR, type,
-  arg01, arg11));
-   }
- /* (PTR0 p+ A) - PTR1 -> (PTR0 - PTR1) + A, assuming PTR0 - PTR1 
simplifies. */
- else if (TREE_CODE (arg0) == POINTER_PLUS_EXPR)
-   {
- tree arg00 = fold_convert_loc (loc, type, TREE_OPERAND (arg0, 0));
- tree arg01 = fold_convert_loc (loc, type, TREE_OPERAND (arg0, 1));
- tree tmp = fold_binary_loc (loc, MINUS_EXPR, type, arg00,
- fold_convert_loc (loc, type, arg1));
- if (tmp)
-   return fold_build2_loc (loc, PLUS_EXPR, type, tmp, arg01);
-   }
- /* PTR0 - (PTR1 p+ A) -> (PTR0 - PTR1) - A, assuming PTR0 - PTR1
-simplifies. */
- else if (TREE_CODE (arg1) == POINTER_PLUS_EXPR)
-   {
- tree arg10 = fold_convert_loc (loc, type,
-TREE_OPERAND (arg1, 0));
- tree arg11 = fold_convert_loc (loc, type,
-TREE_OPERAND (arg1, 1));
- tree tmp = fold_binary_loc (loc, MINUS_EXPR, type,
- fold_convert_loc (loc, type, arg0),
- arg10);
- if (tmp)
-   return fold_build2_loc (loc, MINUS_EXPR, type, tmp, arg11);
-   }
-   }
/* (-A) - B -> (-B) - A  where B is easily negated and we can swap.  */
if (TREE_CODE (arg0) == NEGATE_EXPR
  && negate_expr_p (arg1)
--- 9662,9667 
Index: gcc/match.pd
===
*** gcc/match.pd(revision 229119)
--- gcc/match.pd(working copy)
*** (define_operator_list CPROJ BUILT_IN_CPR
*** 978,984 
 || (POINTER_TYPE_P (TREE_TYPE (@0))
 && T

Re: Add VIEW_CONVERT_EXPR to operand_equal_p

2015-10-22 Thread Andreas Schwab
+  /* Changes in machine mode are never useless conversions unless.  */

Unless what?

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [OpenACC 5/11] C++ FE changes

2015-10-22 Thread Jakub Jelinek
On Wed, Oct 21, 2015 at 03:18:55PM -0400, Nathan Sidwell wrote:
> This patch is the C++ changes matching the C ones of patch 4.  In
> finish_omp_clauses, the gang, worker, & vector clauses are handled the same
> as OpenMP's 'num_threads' clause.  One change to num_threads is the
> augmentation of a diagnostic to add %<...%>  markers to the clause name.

Indeed, lots of older OpenMP diagnostics is missing %<...%> markers around
keywords.  Something to fix eventually.

> 2015-10-20  Cesar Philippidis  
>   Thomas Schwinge  
>   James Norris  
>   Joseph Myers  
>   Julian Brown  
>   Nathan Sidwell 
> 
>   * parser.c (cp_parser_omp_clause_name): Add auto, gang, seq,
>   vector, worker.
>   (cp_parser_oacc_simple_clause): New.
>   (cp_parser_oacc_shape_clause): New.

What I've said for the C FE patch, plus:

> +   if (cp_lexer_next_token_is (lexer, CPP_NAME)
> +   || cp_lexer_next_token_is (lexer, CPP_KEYWORD))
> + {
> +   tree name_kind = cp_lexer_peek_token (lexer)->u.value;
> +   const char *p = IDENTIFIER_POINTER (name_kind);
> +   if (kind == OMP_CLAUSE_GANG && strcmp ("static", p) == 0)

As static is a keyword, wouldn't it be better to just handle that case
using cp_lexer_next_token_is_keyword (lexer, RID_STATIC)?

Also, what is the exact grammar of the shape arguments?
Would be nice to describe the grammar, in the grammar you just say
expression, at least for vector/worker, which is clearly not accurate.

It seems the intent is that num: or length: or static: is optional, right?
But if that is the case, you should treat those as parsed only if followed
by :.  While static is a keyword, so you can't have a variable called like
that, having vector(length) or vector(num) should not be rejected.
So, I would have expected that it should test if it is RID_STATIC
followed by CPP_COLON (and only in that case consume those tokens),
or CPP_NAME of id followed by CPP_COLON (and only in that case consume those
tokens), otherwise parse it as assignment expression.

The C FE may have similar issue.  Plus of course there should be testsuite
coverage for all the weird cases.

> + case OMP_CLAUSE_GANG:
> + case OMP_CLAUSE_VECTOR:
> + case OMP_CLAUSE_WORKER:
> +   /* Operand 0 is the num: or length: argument.  */
> +   t = OMP_CLAUSE_OPERAND (c, 0);
> +   if (t == NULL_TREE)
> + break;
> +
> +   t = maybe_convert_cond (t);

Can you explain the maybe_convert_cond calls (in both cases here,
plus the preexisting in OMP_CLAUSE_VECTOR_LENGTH)?
The reason why it is used for OpenMP if and final clauses is that those have
a condition argument, either the condition is zero or non-zero (so
effectively it is turned into a bool).
But aren't the gang/vector/worker/vector_length arguments integers rather
than conditions?  I'd expect that finish_omp_clauses should verify
those operands are indeed integral expressions (if that is the requirement
in the standard), as it is something that for C++ can't be verified during
parsing, if arbitrary expressions are parsed there.

> @@ -5959,32 +5990,58 @@ finish_omp_clauses (tree clauses, bool a
> break;
>  
>   case OMP_CLAUSE_NUM_THREADS:
> -   t = OMP_CLAUSE_NUM_THREADS_EXPR (c);
> -   if (t == error_mark_node)
> - remove = true;
> -   else if (!type_dependent_expression_p (t)
> -&& !INTEGRAL_TYPE_P (TREE_TYPE (t)))
> - {
> -   error ("num_threads expression must be integral");
> -   remove = true;
> - }
> -   else
> - {
> -   t = mark_rvalue_use (t);
> -   if (!processing_template_decl)
> - {
> -   t = maybe_constant_value (t);
> -   if (TREE_CODE (t) == INTEGER_CST
> -   && tree_int_cst_sgn (t) != 1)
> - {
> -   warning_at (OMP_CLAUSE_LOCATION (c), 0,
> -   "% value must be positive");
> -   t = integer_one_node;
> - }
> -   t = fold_build_cleanup_point_expr (TREE_TYPE (t), t);
> - }
> -   OMP_CLAUSE_NUM_THREADS_EXPR (c) = t;
> - }
> + case OMP_CLAUSE_NUM_GANGS:
> + case OMP_CLAUSE_NUM_WORKERS:
> + case OMP_CLAUSE_VECTOR_LENGTH:

If you are already merging some of the similar handling, please
handle OMP_CLAUSE_NUM_TEAMS and OMP_CLAUSE_NUM_TASKS the same way.

Jakub


Re: [OpenACC 6/11] Reduction initialization

2015-10-22 Thread Jakub Jelinek
On Wed, Oct 21, 2015 at 03:24:13PM -0400, Nathan Sidwell wrote:
> 2015-10-20  Nathan Sidwell  
> 
>   * omp-low.c (oacc_init_rediction_array): New.
>   (oacc_initialize_reduction_data): Initialize array.

Ok.

Jakub


OpenACC constructs nesting checks (was: [gomp4,committed] Handle oacc region in oacc routine)

2015-10-22 Thread Thomas Schwinge
Hi Tom!

On Fri, 16 Oct 2015 11:17:25 +0200, Tom de Vries  wrote:
> this patch checks for occurance of oacc offload regions in oacc routines 
> (which means nested parallelism, which is currently not supported) and 
> gives an appropriate error message.

> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/goacc/parallel-in-routine.c
> @@ -0,0 +1,8 @@
> +#pragma acc routine
> +void
> +foo (void)
> +{
> +#pragma acc parallel /* { dg-error "OpenACC region inside of OpenACC 
> routine, nested parallelism not supported yet" } */
> +  ;
> +}

Please move such tests into
gcc/testsuite/c-c++-common/goacc/nesting-fail-1.c and/or
gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c, and nesting tests
that are expected to succeed into
gcc/testsuite/c-c++-common/goacc/nesting-1.c and/or
gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c.  (I have not looked up
the corresponding Fortran testsuite files.)


When fixing that, please also add back (checking for appropriate
diagnostics) the tests that Nathan apparently didn't know where to put,
and thus removed, in

(gomp-4_0-branch r228678), and

(gomp-4_0-branch r228694).


Grüße
 Thomas


signature.asc
Description: PGP signature


Re: [OpenACC 7/11] execution model

2015-10-22 Thread Jakub Jelinek
On Wed, Oct 21, 2015 at 03:42:26PM -0400, Nathan Sidwell wrote:
> +/*  Flags for an OpenACC loop.  */
> +
> +enum oacc_loop_flags
> +  {

Weird formatting.  I see either
enum foobarbaz {
  e1 = ...,
  e2 = ...
};
or
enum foobarbaz
{
  e1 = ...,
  e2 = ...
};
styles being used heavily, but not this one.

> +OLF_SEQ  = 1u << 0,  /* Explicitly sequential  */
> +OLF_AUTO = 1u << 1,  /* Compiler chooses axes.  */
> +OLF_INDEPENDENT = 1u << 2,   /* Iterations are known independent.  */
> +OLF_GANG_STATIC = 1u << 3,   /* Gang partitioning is static (has 
> op). */
> +
> +/* Explicitly specified loop axes.  */
> +OLF_DIM_BASE = 4,
> +OLF_DIM_GANG   = 1u << (OLF_DIM_BASE + GOMP_DIM_GANG),
> +OLF_DIM_WORKER = 1u << (OLF_DIM_BASE + GOMP_DIM_WORKER),
> +OLF_DIM_VECTOR = 1u << (OLF_DIM_BASE + GOMP_DIM_VECTOR),
> +
> +OLF_MAX = OLF_DIM_BASE + GOMP_DIM_MAX
> +  };
> +

> +  if (checking)
> +{
> +  if (has_seq && (this_mask || has_auto))
> + error_at (gimple_location (stmt), "% overrides other OpenACC loop 
> specifiers");
> +  else if (has_auto && this_mask)
> + error_at (gimple_location (stmt), "% conflicts with other 
> OpenACC loop specifiers");
> +
> +  if (this_mask & outer_mask)
> + error_at (gimple_location (stmt), "inner loop uses same  OpenACC 
> parallelism as containing loop");

Too long lines.  Plus 2 spaces into one.
> + if (check && OMP_CLAUSE_OPERAND (c, 0))
> +   error_at (gimple_location (stmt),
> + "argument not permitted on %<%s%> clause in"

%qs instead of %<%s%> ?

> @@ -5769,6 +5885,166 @@ lower_send_shared_vars (gimple_seq *ilis
>  }
>  }
>  
> +/* Emit an OpenACC head marker call, encapulating the partitioning and
> +   other information that must be processed by the target compiler.
> +   Return the maximum number of dimensions the associated loop might
> +   be partitioned over.  */
> +
> +static unsigned
> +lower_oacc_head_mark (location_t loc, tree clauses,
> +   gimple_seq *seq, omp_context *ctx)
> +{
> +  unsigned levels = 0;
> +  unsigned tag = 0;
> +  tree gang_static = NULL_TREE;
> +  auto_vec args;

If you usually push there 3 or 4 arguments, wouldn't it be better to
just use auto_vec args; instead?

> +  if (gang_static)
> +{
> +  if (DECL_P  (gang_static))

Formatting, too many spaces.

> +  tree marker = build_int_cst
> +(integer_type_node, (head ? IFN_UNIQUE_OACC_HEAD_MARK
> +  : IFN_UNIQUE_OACC_TAIL_MARK));

I really don't like putting the arguments on a different from
the function name, unless you have to.
Here you can easily do say
  enum internal_fn marker_val = head ? IFN_UNIQUE_OACC_HEAD_MARK
 : IFN_UNIQUE_OACC_TAIL_MARK;
  tree marker = build_int_cst (integer_type_node, marker_val);
same number of lines, easier to read.

> +  gcall *call = gimple_build_call_internal
> +(IFN_UNIQUE, 1 + (tofollow != NULL_TREE), marker, tofollow);

Similarly.

> +  gcall *fork = gimple_build_call_internal
> + (IFN_UNIQUE, 2,
> +  build_int_cst (unsigned_type_node, IFN_UNIQUE_OACC_FORK), place);
> +  gcall *join = gimple_build_call_internal
> + (IFN_UNIQUE, 2,
> +  build_int_cst (unsigned_type_node, IFN_UNIQUE_OACC_JOIN), place);

Likewise.  Just use a 
tree t = build_int_cst (unsigned_type_node, IFN_UNIQUE_OACC_FORK);
gcall *fork = gimple_build_call_internal (IFN_UNIQUE, 2, t, place);
etc.

> +  expr = build2 (TRUNC_MOD_EXPR, ivar_type, ivar,
> +  fold_convert (ivar_type, collapse->iters));
> +  expr = build2 (MULT_EXPR, diff_type, fold_convert (diff_type, expr),
> +  collapse->step);
> +  expr = build2 (plus_code, iter_type, collapse->base,
> +  fold_convert (plus_type, expr));

Shouldn't these be fold_build2 instead?
Of course Richi would prefer gimple_build, but omp-low.c has already
way too much of fold_build2 + force_gimple_operand_gsi code around
that it is fine with me this way.

>  /* An unduplicable, uncombinable function.  Generally used to preserve
> a CFG property in the face of jump threading, tail merging or
> other such optimizations.  The first argument distinguishes
> between uses.  Other arguments are as needed for use.  The return
> type depends on use too.  */
>  DEF_INTERNAL_FN (UNIQUE, ECF_NOTHROW | ECF_LEAF, NULL)
>  #define IFN_UNIQUE_UNSPEC 0  /* Undifferentiated UNIQUE.  */
> +
> +/* FORK and JOIN mark the points at which OpenACC partitioned
> +   execution is entered or exited.  They take an INTEGER_CST argument,
> +   indicating the axis of forking or joining and return nothing.  */
> +#define IFN_UNIQUE_OACC_FORK 1
> +#define IFN_UNIQUE_OACC_JOIN 2
> +/* HEAD_MARK and TAIL_MARK are used to demark the sequence entering or
> +   leaving partitioned execution.  */
> +#define IFN_UNIQUE_OACC_HEAD_MARK 3
> +#define IFN_UNIQUE_OACC_TAIL_MARK 4

Shouldn't these be

Re: [OpenACC 8/11] device-specific lowering

2015-10-22 Thread Jakub Jelinek
On Wed, Oct 21, 2015 at 03:49:08PM -0400, Nathan Sidwell wrote:
> This patch is the device-specific half of the previous patch.  It processes
> the partition head & tail markers and loop abstraction functions inserted
> during omp lowering.
> 
> In the oacc_device_lower pass we scan the CFG reconstructing the set of
> nested loops demarked by IFN_UNIQUE (HEAD_MARK) & IFN_UNIQUE (TAIL_MARK)
> functions. The HEAD_MARK function provides  the loop partition information
> provided by the user.  Once constructed we can iterate over that structure
> checking partitioning consistency (for instance an inner loop must use a
> dimension 'inside' an outer loop). We also assign specific partitioning axes
> here.  Partitioning updates the parameters of the IFN_LOOP and IFN_FORK/JOIN
> functions appropriately.
> 
> Once partitioning has been determined, we iterate over the CFG scanning for
> the marker, fork/join and loop functions.  The marker functions are deleted,
> the fork & join functions are conditionally deleted (using the target hook
> of patch 3), and the loop function is expanded into code calculating the
> loop parameters depending on how the loop has been partitioned.  This  uses
> the OACC_DIM_POS and OACC_DIM_SIZE builtins included in patch 7.

So, how do you expand the OACC loops on non-PTX devices (host, or say
XeonPhi)?  Do you drop the IFNs and replace stuff with normal loops?
I don't see anything that would e.g. set the various flags that e.g. OpenMP
#pragma omp simd or Cilk+ #pragma simd sets, like loop->safelen,
loop->force_vectorize, maybe loop->simduid and promote some vars to simduid
arrays if that is relevant to OpenACC.

Jakub


Re: [OpenACC 9/11] oacc_device_lower pass gate

2015-10-22 Thread Jakub Jelinek
On Wed, Oct 21, 2015 at 03:50:31PM -0400, Nathan Sidwell wrote:
> 
> This patch is obvious, but included for completeness. We always want to run
> the device lowering pass (when openacc is enabled), in order to delete the
> marker and loop functions that should never be seen after this point.
> 
> nathan

> 2015-10-20  Nathan Sidwell  
> 
>   * omp-low.c (pass_oacc_device_lower::execute): Ignore errors.

Ok.

Jakub


Re: [OpenACC 10/11] remove plugin restriction

2015-10-22 Thread Jakub Jelinek
On Wed, Oct 21, 2015 at 03:51:42PM -0400, Nathan Sidwell wrote:
> Here's another obvious patch.  The ptx plugin no longer needs to barf on
> gang or worker dimensions of non-unity.
> 
> nathan
> 
> 

> 2015-10-20  Nathan Sidwell  
> 
>   * plugin/plugin-nvptx.c (nvptx_exec): Remove check on compute
>   dimensions.

Ok.

Jakub


Re: [OpenACC 11/11] execution tests

2015-10-22 Thread Jakub Jelinek
On Wed, Oct 21, 2015 at 03:53:17PM -0400, Nathan Sidwell wrote:
> This patch has some new execution tests, verifying loop partitioning is
> behaving as expected.
> 
> There are more execution tests on the gomp4 branch, but many of them use
> reductions.  We'll merge those once reductions are merged.
> 
> nathan

> 2015-10-20  Nathan Sidwell  
> 
>   * testsuite/libgomp.oacc-c-c++-common/loop-g-1.c: New.
>   * testsuite/libgomp.oacc-c-c++-common/loop-w-1.c: New.
>   * testsuite/libgomp.oacc-c-c++-common/loop-g-1.s: New.

As Ilya mentioned, this one should go.

>   * testsuite/libgomp.oacc-c-c++-common/loop-wv-1.c: New.
>   * testsuite/libgomp.oacc-c-c++-common/loop-g-2.c: New.
>   * testsuite/libgomp.oacc-c-c++-common/loop-gwv-1.c: New.
>   * testsuite/libgomp.oacc-c-c++-common/loop-v-1.c: New.

And, I must say I'm at least missing testcases that check parsing but also
runtime behavior of the vector or worker clause arguments (there
is one gang (static:1) clause, but not the other clauses nor other styles of
gang arguments.

Jakub


Re: [gomp4 00/14] NVPTX: further porting

2015-10-22 Thread Jakub Jelinek
On Wed, Oct 21, 2015 at 06:18:25PM +0300, Alexander Monakov wrote:
> On Wed, 21 Oct 2015, Bernd Schmidt wrote:
> 
> > On 10/20/2015 08:34 PM, Alexander Monakov wrote:
> > > This patch series ports enough of libgomp.c to get warp-level parallelism
> > > working for OpenMP offloading.  The overall approach is as follows.
> > 
> > Could you elaborate a bit what you mean by this just so we understand each
> > other in terms of terminology? "Warp-level" sounds to me like you have all
> > threads in a warp executing in lockstep at all times. If individual threads
> > can take different paths, I'd expect it to be called thread-level 
> > parallelism
> > or something like that.
> 
> Sorry, that was unclear.  What I meant is that there is a degree of
> parallelism available across different warps, but not across different teams
> (because only 1 team is spawned), nor across threads in a warp (because all
> threads in a warp except one exit immediately -- later on we'd need to
> keep them converged so they can enter a simd region together).
>  
> > What is your end goal in terms of mapping GPU parallelism onto OpenMP?
> 
> OpenMP team is mapped to a CUDA thread block, OpenMP thread is mapped to a
> warp, OpenMP simd lane is mapped to a CUDA thread.  So, follow the OpenACC
> model.  Like in OpenACC, we'd need to artificially deactivate/reactivate warp
> members on simd region boundaires.

Does that apply also to threads within a warp?  I.e. is .local local to each
thread in the warp, or to the whole warp, and if the former, how can say at
the start of a SIMD region or at its end the local vars be broadcast to
other threads and collected back?  One thing is scalar vars, another
pointers, or references to various types, or even bigger indirection.

Jakub


Re: [OpenACC 2/11] PTX backend changes

2015-10-22 Thread Bernd Schmidt

On 10/22/2015 10:12 AM, Jakub Jelinek wrote:



@@ -2129,6 +3242,19 @@ nvptx_file_end (void)
FOR_EACH_HASH_TABLE_ELEMENT (*needed_fndecls_htab, decl, tree, iter)
  nvptx_record_fndecl (decl, true);
fputs (func_decls.str().c_str(), asm_out_file);
+
+  if (worker_bcast_hwm)
+{
+  /* Define the broadcast buffer.  */
+
+  worker_bcast_hwm = (worker_bcast_hwm + worker_bcast_align - 1)
+   & ~(worker_bcast_align - 1);
+
+  fprintf (asm_out_file, "// BEGIN VAR DEF: %s\n", worker_bcast_name);
+  fprintf (asm_out_file, ".shared .align %d .u8 %s[%d];\n",
+  worker_bcast_align,
+  worker_bcast_name, worker_bcast_hwm);
+}


So, is the worker broadcast buffer effectively a file scope .shared
variable?  My worry is that as .shared is quite limited resource, if you
compile many TUs and each allocates its own broadcast buffer you run out of
shared memory.  Is there any way how to share the broadcast buffers in
between different TUs (other than LTO)?


I think LTO is the mechanism, nvptx-lto1 only ever produces one assembly 
file. So I'm not really concerned about this.


One other thing about this occurred to me yesterday - I was worried 
about thread-safety with a single static buffer - couldn't code execute 
multiple kernels at the same time? I googled a bit, and could not 
actually find a definitive answer as to whether all shared memory is 
allocated at kernel launch, or just the dynamic portion?



Bernd


Re: [PATCH] Fix PR67443

2015-10-22 Thread Dominik Vogt
> Eventually we'll get another testcase so I'll leave this for
> comments a while and will commit somewhen later this week.

In case you're referring to my attempt to port the test case to
x86:  All the efforts to reproduce the bug on x86 have failed so
far.  It seems that Gcc is much better in handling bit filed
accesses on little endian targets (x86) than on big endian
(s390[x], power).  In other words, on x86 earlier optimisations
prevent the dse2 pass from having to deal with the problematic
situation.  So, don't wait for me coming up with more test cases;
I'll keep trying, but I'm pessimistic.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany



Re: [PATCH] 2015-10-19 Benedikt Huber Philipp Tomsich

2015-10-22 Thread Benedikt Huber
https://gcc.gnu.org/contribute.html states

"If you do not have write access and a patch of yours has been approved, but 
not committed, please advise the approver of that fact. You may want to point 
out lack of write access in your initial submission, too.”

Should I apply for svn write access? And if so, who would approve that?

> On 20 Oct 2015, at 15:34, Marcus Shawcroft  wrote:
> 
> On 4 January 1970 at 00:02, Benedikt Huber
>  wrote:
>>* config/aarch64/aarch64-builtins.c: Builtins for rsqrt and rsqrtf.
>>* config/aarch64/aarch64-protos.h: Declare.
>>* config/aarch64/aarch64-simd.md: Matching expressions for frsqrte and
>>frsqrts.
>>* config/aarch64/aarch64-tuning-flags.def: Added recip_sqrt.
>>* config/aarch64/aarch64.c: New functions. Emit rsqrt estimation code 
>> when
>>applicable.
>>* config/aarch64/aarch64.md: Added enum entries.
>>* config/aarch64/aarch64.opt: Added option -mlow-precision-recip-sqrt.
>>* testsuite/gcc.target/aarch64/rsqrt_asm_check_common.h: Common 
>> macros for
>>assembly checks.
>>* testsuite/gcc.target/aarch64/rsqrt_asm_check_negative_1.c: Make sure
>>frsqrts and frsqrte are not emitted.
>>* testsuite/gcc.target/aarch64/rsqrt_asm_check_1.c: Make sure frsqrts 
>> and
>>frsqrte are emitted.
>>* testsuite/gcc.target/aarch64/rsqrt_1.c: Functional tests for rsqrt.
> 
> OK /Marcus



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [Boolean Vector, patch 1/5] Introduce boolean vector to be used as a vector comparison type

2015-10-22 Thread Andreas Schwab
FAIL: gcc.c-torture/compile/pr54713-1.c   -O0  (internal compiler error)

$ gcc/xgcc -Bgcc/ -w -c ../gcc/testsuite/gcc.c-torture/compile/pr54713-1.c
../gcc/testsuite/gcc.c-torture/compile/pr54713-1.c: In function ‘f4’:
../gcc/testsuite/gcc.c-torture/compile/pr54713-1.c:31:1: error: position plus 
size exceeds size of referenced object in BIT_FIELD_REF
 f4 (V *p, V *q)
 ^
BIT_FIELD_REF <_8, 32, 64>
../gcc/testsuite/gcc.c-torture/compile/pr54713-1.c:33:18: note: in statement
   *p = (*p ^ *q) == *q;
  ^
_47 = BIT_FIELD_REF <_8, 32, 64>;
../gcc/testsuite/gcc.c-torture/compile/pr54713-1.c:31:1: error: position plus 
size exceeds size of referenced object in BIT_FIELD_REF
 f4 (V *p, V *q)
 ^
BIT_FIELD_REF <_8, 32, 96>
../gcc/testsuite/gcc.c-torture/compile/pr54713-1.c:33:18: note: in statement
   *p = (*p ^ *q) == *q;
  ^
_50 = BIT_FIELD_REF <_8, 32, 96>;
../gcc/testsuite/gcc.c-torture/compile/pr54713-1.c:31:1: error: position plus 
size exceeds size of referenced object in BIT_FIELD_REF
 f4 (V *p, V *q)
 ^
BIT_FIELD_REF <_8, 32, 128>
../gcc/testsuite/gcc.c-torture/compile/pr54713-1.c:33:18: note: in statement
   *p = (*p ^ *q) == *q;
  ^
_53 = BIT_FIELD_REF <_8, 32, 128>;
../gcc/testsuite/gcc.c-torture/compile/pr54713-1.c:31:1: error: position plus 
size exceeds size of referenced object in BIT_FIELD_REF
 f4 (V *p, V *q)
 ^
BIT_FIELD_REF <_8, 32, 160>
../gcc/testsuite/gcc.c-torture/compile/pr54713-1.c:33:18: note: in statement
   *p = (*p ^ *q) == *q;
  ^
_56 = BIT_FIELD_REF <_8, 32, 160>;
../gcc/testsuite/gcc.c-torture/compile/pr54713-1.c:31:1: error: position plus 
size exceeds size of referenced object in BIT_FIELD_REF
 f4 (V *p, V *q)
 ^
BIT_FIELD_REF <_8, 32, 192>
../gcc/testsuite/gcc.c-torture/compile/pr54713-1.c:33:18: note: in statement
   *p = (*p ^ *q) == *q;
  ^
_59 = BIT_FIELD_REF <_8, 32, 192>;
../gcc/testsuite/gcc.c-torture/compile/pr54713-1.c:31:1: error: position plus 
size exceeds size of referenced object in BIT_FIELD_REF
 f4 (V *p, V *q)
 ^
BIT_FIELD_REF <_8, 32, 224>
../gcc/testsuite/gcc.c-torture/compile/pr54713-1.c:33:18: note: in statement
   *p = (*p ^ *q) == *q;
  ^
_62 = BIT_FIELD_REF <_8, 32, 224>;
../gcc/testsuite/gcc.c-torture/compile/pr54713-1.c:31:1: internal compiler 
error: verify_gimple failed
 f4 (V *p, V *q)
 ^
0xa7b98b verify_gimple_in_cfg(function*, bool)
../../gcc/tree-cfg.c:5093
0x97f63f execute_function_todo
../../gcc/passes.c:1976
0x97fffb execute_todo
../../gcc/passes.c:2033

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [PATCH 1 of 2] print help for undocumented options

2015-10-22 Thread Eric Botcazou
Martin,

some subdirectories have their own ChangeLog file so you need to move...

> 2015-10-21  Martin Sebor  
> 
>  PR driver/68043
>  * opts.c (undocumented_msg, use_diagnosed_msg): New globals.
> (print_filtered_help): Reference aliased option's name and encourage
>  readers to use it in preference to the alias if the former is not
>  documented.  Mention when using an option is diagnosed.
>  * gcc.c (display_help): End each sentence with a period.
> 
>  * ada/gcc-interface/lang.opt: End each sentence that describes
>  an option with a period.

...this one to ada/ChangeLog (without the ada/ prefix).

>  * c-family/c.opt: Same.

...this one to c-family/ChangeLog (without the c-family/ prefix).

>  * fortran/lang.opt: Same.

...this one to fortran/ChangeLog (without the fortran/ prefix).

-- 
Eric Botcazou


Re: [Boolean Vector, patch 1/5] Introduce boolean vector to be used as a vector comparison type

2015-10-22 Thread Ilya Enkovich
2015-10-22 13:13 GMT+03:00 Andreas Schwab :
> FAIL: gcc.c-torture/compile/pr54713-1.c   -O0  (internal compiler error)

Can't reproduce it on i386. What's config used?

Ilya


Re: [vec-cmp, patch 1/6] Add optabs for vector comparison

2015-10-22 Thread Ilya Enkovich
2015-10-21 20:25 GMT+03:00 Jeff Law :
> On 10/08/2015 08:52 AM, Ilya Enkovich wrote:
>>
>> Hi,
>>
>> This series introduces autogeneration of vector comparison and its support
>> on i386 target.  It lets comparison statements to be vectorized into vector
>> comparison instead of VEC_COND_EXPR.  This allows to avoid some restrictions
>> implied by boolean patterns.  This series applies on top of bolean vectors
>> series [1].
>>
>> This patch introduces optabs for vector comparison.
>>
>> [1] https://gcc.gnu.org/ml/gcc-patches/2015-10/msg00215.html
>>
>> Thanks,
>> Ilya
>> --
>> gcc/
>>
>> 2015-10-08  Ilya Enkovich  
>>
>> * expr.c (do_store_flag): Use expand_vec_cmp_expr for mask
>> results.
>> * optabs-query.h (get_vec_cmp_icode): New.
>> * optabs-tree.c (expand_vec_cmp_expr_p): New.
>> * optabs-tree.h (expand_vec_cmp_expr_p): New.
>> * optabs.c (vector_compare_rtx): Add OPNO arg.
>> (expand_vec_cond_expr): Adjust to vector_compare_rtx change.
>> (expand_vec_cmp_expr): New.
>> * optabs.def (vec_cmp_optab): New.
>> (vec_cmpu_optab): New.
>> * optabs.h (expand_vec_cmp_expr): New.
>> * tree-vect-generic.c (expand_vector_comparison): Add vector
>> comparison optabs check.
>>
>>
>> diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
>> index 3b03338..aa863cf 100644
>> --- a/gcc/optabs-tree.c
>> +++ b/gcc/optabs-tree.c
>> @@ -320,6 +320,19 @@ supportable_convert_operation (enum tree_code code,
>> return false;
>>   }
>>
>> +/* Return TRUE if appropriate vector insn is available
>> +   for vector comparison expr with vector type VALUE_TYPE
>> +   and resulting mask with MASK_TYPE.  */
>> +
>> +bool
>> +expand_vec_cmp_expr_p (tree value_type, tree mask_type)
>> +{
>> +  enum insn_code icode = get_vec_cmp_icode (TYPE_MODE (value_type),
>> +   TYPE_MODE (mask_type),
>> +   TYPE_UNSIGNED (value_type));
>> +  return (icode != CODE_FOR_nothing);
>> +}
>> +
>
> Nothing inherently wrong with the code, but it seems like it's in the wrong
> place.  Why optabs-tree rather than optabs-query?

Because it uses tree type for arguments. There is no single tree usage
in optabs-query.c. I think expand_vec_cond_expr_p is in optabs-tree
for the same reason.

Thanks,
Ilya

>
> I think with that fixed this patch will be ready to go onto the trunk.
>
> jeff
>


Re: [Boolean Vector, patch 1/5] Introduce boolean vector to be used as a vector comparison type

2015-10-22 Thread Andreas Schwab
Ilya Enkovich  writes:

> 2015-10-22 13:13 GMT+03:00 Andreas Schwab :
>> FAIL: gcc.c-torture/compile/pr54713-1.c   -O0  (internal compiler error)
>
> Can't reproduce it on i386. What's config used?

http://gcc.gnu.org/ml/gcc-testresults/2015-10/msg02350.html
http://gcc.gnu.org/ml/gcc-testresults/2015-10/msg02361.html
http://gcc.gnu.org/ml/gcc-testresults/2015-10/msg02396.html

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [0/7] Type promotion pass and elimination of zext/sext

2015-10-22 Thread Kugan


On 21/10/15 23:45, Richard Biener wrote:
> On Tue, Oct 20, 2015 at 10:03 PM, Kugan
>  wrote:
>>
>>
>> On 07/09/15 12:53, Kugan wrote:
>>>
>>> This a new version of the patch posted in
>>> https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00226.html. I have done
>>> more testing and spitted the patch to make it more easier to review.
>>> There are still couple of issues to be addressed and I am working on them.
>>>
>>> 1. AARCH64 bootstrap now fails with the commit
>>> 94f92c36a83d66a893c3bc6f00a038ba3dbe2a6f. simplify-rtx.c is mis-compiled
>>> in stage2 and fwprop.c is failing. It looks to me that there is a latent
>>> issue which gets exposed my patch. I can also reproduce this in x86_64
>>> if I use the same PROMOTE_MODE which is used in aarch64 port. For the
>>> time being, I am using  patch
>>> 0006-temporary-workaround-for-bootstrap-failure-due-to-co.patch as a
>>> workaround. This meeds to be fixed before the patches are ready to be
>>> committed.
>>>
>>> 2. vector-compare-1.c from c-c++-common/torture fails to assemble with
>>> -O3 -g Error: unaligned opcodes detected in executable segment. It works
>>> fine if I remove the -g. I am looking into it and needs to be fixed as well.
>>
>> Hi Richard,
>>
>> Now that stage 1 is going to close, I would like to get these patches
>> accepted for stage1. I will try my best to address your review comments
>> ASAP.
> 
> Ok, can you make the whole patch series available so I can poke at the
> implementation a bit?  Please state the revision it was rebased on
> (or point me to a git/svn branch the work resides on).
> 

Thanks. Please find the patched rebated against trunk@229156. I have
skipped the test-case readjustment patches.


Thanks,
Kugan
>From 2dc1cccfc59ae6967928b52396227b52a50803d9 Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah 
Date: Thu, 22 Oct 2015 10:54:31 +1100
Subject: [PATCH 4/4] debug stmt in widen mode

---
 gcc/gimple-ssa-type-promote.c | 82 +--
 1 file changed, 79 insertions(+), 3 deletions(-)

diff --git a/gcc/gimple-ssa-type-promote.c b/gcc/gimple-ssa-type-promote.c
index e62a7c6..c0b6aa1 100644
--- a/gcc/gimple-ssa-type-promote.c
+++ b/gcc/gimple-ssa-type-promote.c
@@ -589,10 +589,86 @@ fixup_uses (tree use, tree promoted_type, tree old_type)
 	{
 	case GIMPLE_DEBUG:
 	{
-	  gsi = gsi_for_stmt (stmt);
-	  gsi_remove (&gsi, true);
-	  break;
+	  /* Change the GIMPLE_DEBUG stmt such that the value bound is
+		 computed in promoted_type and then converted to required
+		 type.  */
+	  tree op, new_op = NULL_TREE;
+	  gdebug *copy = NULL, *gs = as_a  (stmt);
+	  enum tree_code code;
+
+	  /* Get the value that is bound in debug stmt.  */
+	  switch (gs->subcode)
+		{
+		case GIMPLE_DEBUG_BIND:
+		  op = gimple_debug_bind_get_value (gs);
+		  break;
+		case GIMPLE_DEBUG_SOURCE_BIND:
+		  op = gimple_debug_source_bind_get_value (gs);
+		  break;
+		default:
+		  gcc_unreachable ();
+		}
+
+	  code = TREE_CODE (op);
+	  /* Convert the value computed in promoted_type to
+		 old_type.  */
+	  if (code == SSA_NAME && use == op)
+		new_op = build1 (NOP_EXPR, old_type, use);
+	  else if (TREE_CODE_CLASS (TREE_CODE (op)) == tcc_unary
+		   && code != NOP_EXPR)
+		{
+		  tree op0 = TREE_OPERAND (op, 0);
+		  if (op0 == use)
+		{
+		  tree temp = build1 (code, promoted_type, op0);
+		  new_op = build1 (NOP_EXPR, old_type, temp);
+		}
+		}
+	  else if (TREE_CODE_CLASS (TREE_CODE (op)) == tcc_binary
+		   /* Skip codes that are rejected in safe_to_promote_use_p.  */
+		   && code != LROTATE_EXPR
+		   && code != RROTATE_EXPR
+		   && code != COMPLEX_EXPR)
+		{
+		  tree op0 = TREE_OPERAND (op, 0);
+		  tree op1 = TREE_OPERAND (op, 1);
+		  if (op0 == use || op1 == use)
+		{
+		  if (TREE_CODE (op0) == INTEGER_CST)
+			op0 = convert_int_cst (promoted_type, op0, SIGNED);
+		  if (TREE_CODE (op1) == INTEGER_CST)
+			op1 = convert_int_cst (promoted_type, op1, SIGNED);
+		  tree temp = build2 (code, promoted_type, op0, op1);
+		  new_op = build1 (NOP_EXPR, old_type, temp);
+		}
+		}
+
+	  /* Create new GIMPLE_DEBUG stmt with the new value (new_op) to
+		 be bound, if new value has been calculated */
+	  if (new_op)
+		{
+		  if (gimple_debug_bind_p (stmt))
+		{
+		  copy = gimple_build_debug_bind
+			(gimple_debug_bind_get_var (stmt),
+			 new_op,
+			 stmt);
+		}
+		  if (gimple_debug_source_bind_p (stmt))
+		{
+		  copy = gimple_build_debug_source_bind
+			(gimple_debug_source_bind_get_var (stmt), new_op,
+			 stmt);
+		}
+
+		  if (copy)
+		{
+		  gsi = gsi_for_stmt (stmt);
+		  gsi_replace (&gsi, copy, false);
+		}
+		}
 	}
+	  break;
 
 	case GIMPLE_ASM:
 	case GIMPLE_CALL:
-- 
1.9.1

>From 1044b1b5ebf8ad696a942207b031e3668ab2a0de Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah 
Date: Thu, 22 Oct 2015 10:53:56 +1100
Subject: [PATCH 3/4] 

Move fold_trunc_transparent_mathfn to match.pd

2015-10-22 Thread Richard Sandiford
This moves the fold rules for trunc, floor, ceil, round, nearbyint and
rint in one go, since they're tested as a group.  Most of the code is
supporting the f(x)->x fold when x is known to be integer-valued.
Like with the non-negative test, this is probably more elegantly handled
by tracking range information for reals, but until that happens, I think
we should handle it analogously to tree_expr_nonnegative_p.

I've incorporated the fix for PR68031 in the new version of
integer_valued_real_p.  However, it seemed confusing to test for an
SSA name at the head of the function rather than the case statement,
and not fall through to tree_simple_nonnegative_warnv_p (which
conceptually shouldn't care whether an update is in progress).
But tree_simple_nonnegative_warnv_p is a no-op for SSA names,
so I simply changed it to:

  return (!name_registered_for_update_p (t)
  && depth < PARAM_VALUE (PARAM_MAX_SSA_NAME_QUERY_DEPTH)
  && gimple_stmt_nonnegative_warnv_p (SSA_NAME_DEF_STMT (t),
  strict_overflow_p, depth));

and used that in the new code too.

Doing these folds later meant that IPA would start to use information
about the aborting sinf and floor in 20030125-1.c before the folds
kicked in.  I changed them from noinline to weak to stop that.

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
OK to install?

Thanks,
Richard


gcc/
* builtins.c (integer_valued_real_p): Move to fold-const.c.
(fold_trunc_transparent_mathfn, fold_builtin_trunc, fold_builtin_floor)
(fold_builtin_ceil, fold_builtin_round): Delete.
(fold_builtin_1): Handle constant trunc, floor, ceil and round
arguments here.
* fold-const.h (integer_valued_real_unary_p)
(integer_valued_real_binary_p, integer_valued_real_call_p)
(integer_valued_real_single_p, integer_valued_real_p): Declare.
* fold-const.c (tree_single_nonnegative_warnv_p): Move
name_registered_for_update_p check to SSA_NAME case statement.
Don't call tree_simple_nonnegative_warnv_p for SSA names.
(integer_valued_real_unary_p, integer_valued_real_binary_p)
(integer_valued_real_call_p, integer_valued_real_single_p)
(integer_valued_real_invalid_p): New functions.
(integer_valued_real_p): Move from fold-const.c and rework
to call the functions above.  Handle SSA names.
* gimple-fold.h (gimple_stmt_integer_valued_real_p): Declare.
* gimple-fold.c (gimple_assign_integer_valued_real_p)
(gimple_call_integer_valued_real_p, gimple_phi_integer_valued_real_p)
(gimple_stmt_integer_valued_real_p): New functions.
* match.pd: Fold f(f(x))->f(x) for fp->fp rounding functions f.
Fold f(x)->x for the same f if x is known to be integer-valued.
Fold f(extend(x))->extend(f'(x)) if doing so doesn't affect
the result.  Canonicalize floor(x) as trunc(x) if x is
nonnegative.

gcc/testsuite/
* gcc.c-torture/execute/20030125-1.c (floor, floorf, sin, sinf):
Make weak rather than noinline.
* gcc.dg/builtins-57.c: Compile with -O.
* gcc.dg/torture/builtin-integral-1.c: Skip for -O0.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index e5e65ba..c70bbfd 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -154,16 +154,10 @@ static tree fold_builtin_inf (location_t, tree, int);
 static tree fold_builtin_nan (tree, tree, int);
 static tree rewrite_call_expr (location_t, tree, int, tree, int, ...);
 static bool validate_arg (const_tree, enum tree_code code);
-static bool integer_valued_real_p (tree);
-static tree fold_trunc_transparent_mathfn (location_t, tree, tree);
 static rtx expand_builtin_fabs (tree, rtx, rtx);
 static rtx expand_builtin_signbit (tree, rtx);
 static tree fold_builtin_pow (location_t, tree, tree, tree, tree);
 static tree fold_builtin_powi (location_t, tree, tree, tree, tree);
-static tree fold_builtin_trunc (location_t, tree, tree);
-static tree fold_builtin_floor (location_t, tree, tree);
-static tree fold_builtin_ceil (location_t, tree, tree);
-static tree fold_builtin_round (location_t, tree, tree);
 static tree fold_builtin_int_roundingfn (location_t, tree, tree);
 static tree fold_builtin_bitop (tree, tree);
 static tree fold_builtin_strchr (location_t, tree, tree, tree);
@@ -7320,117 +7314,6 @@ fold_builtin_nan (tree arg, tree type, int quiet)
   return build_real (type, real);
 }
 
-/* Return true if the floating point expression T has an integer value.
-   We also allow +Inf, -Inf and NaN to be considered integer values.  */
-
-static bool
-integer_valued_real_p (tree t)
-{
-  switch (TREE_CODE (t))
-{
-case FLOAT_EXPR:
-  return true;
-
-case ABS_EXPR:
-case SAVE_EXPR:
-  return integer_valued_real_p (TREE_OPERAND (t, 0));
-
-case COMPOUND_EXPR:
-case MODIFY_EXPR:
-case BIND_EXPR:
-  return integer_valued_real_p (TREE_OPERAND (t, 1

Re: [PATCH] Pass manager: add support for termination of pass list

2015-10-22 Thread Martin Liška
On 10/21/2015 04:06 PM, Richard Biener wrote:
> On Wed, Oct 21, 2015 at 1:24 PM, Martin Liška  wrote:
>> On 10/21/2015 11:59 AM, Richard Biener wrote:
>>> On Wed, Oct 21, 2015 at 11:19 AM, Martin Liška  wrote:
 On 10/20/2015 03:39 PM, Richard Biener wrote:
> On Tue, Oct 20, 2015 at 3:00 PM, Martin Liška  wrote:
>> Hello.
>>
>> As part of upcoming merge of HSA branch, we would like to have 
>> possibility to terminate
>> pass manager after execution of the HSA generation pass. The HSA 
>> back-end is implemented
>> as a tree pass that directly emits HSAIL from gimple tree 
>> representation. The pass operates
>> on clones created by HSA IPA pass and the pass manager should stop 
>> execution of further
>> RTL passes.
>>
>> Suggested patch survives bootstrap and regression tests on 
>> x86_64-linux-pc.
>>
>> What do you think about it?
>
> Are you sure it works this way?
>
> Btw, you will miss executing of all the cleanup passes that will
> eventually free memory
> associated with the function.  So I'd rather support a
> TODO_discard_function which
> should basically release the body from the cgraph.

 Hi.

 Agree with you that I should execute all TODOs, which can be easily done.
 However, if I just try to introduce the suggested TODO and handle it 
 properly
 by calling cgraph_node::release_body, then for instance fn->gimple_df, 
 fn->cfg are
 released and I hit ICEs on many places.

 Stopping the pass manager looks necessary, or do I miss something?
>>>
>>> "Stopping the pass manager" is necessary after TODO_discard_function, yes.
>>> But that may be simply done via a has_body () check then?
>>
>> Thanks, there's second version of the patch. I'm going to start regression 
>> tests.
> 
> As release_body () will free cfun you should pop_cfun () before
> calling it (and thus

Well, release_function_body calls both push & pop, so I think calling pop
before cgraph_node::release_body is not necessary.

If tried to call pop_cfun before cgraph_node::release_body, I have cfun still
pointing to the same (function *) (which is gcc_freed, but cfun != NULL).

> drop its modification).  Also TODO_discard_functiuon should be only set for
> local passes thus you should probably add a gcc_assert (cfun).
> I'd move its handling earlier, definitely before the ggc_collect, eventually
> before the pass_fini_dump_file () (do we want a last dump of the
> function or not?).

Fully agree, moved here.

> 
> @@ -2397,6 +2410,10 @@ execute_pass_list_1 (opt_pass *pass)
>  {
>gcc_assert (pass->type == GIMPLE_PASS
>   || pass->type == RTL_PASS);
> +
> +
> +  if (!gimple_has_body_p (current_function_decl))
> +   return;
> 
> too much vertical space.  With popping cfun before releasing the body the 
> check
> might just become if (!cfun) and

As mentioned above, as release body is symmetric (calling push & pop), the 
suggested
guard will not work. 

> 
> @@ -2409,7 +2426,7 @@ execute_pass_list (function *fn, opt_pass *pass)
>  {
>push_cfun (fn);
>execute_pass_list_1 (pass);
> -  if (fn->cfg)
> +  if (gimple_has_body_p (current_function_decl) && fn->cfg)
>  {
>free_dominance_info (CDI_DOMINATORS);
>free_dominance_info (CDI_POST_DOMINATORS);
> 
> here you'd need to guard the pop_cfun call on cfun != NULL.  IMHO it's better
> to not let cfun point to deallocated data.

As I've read the code, since we gcc_free 'struct function', one can just rely on
gimple_has_body_p (current_function_decl) as it correctly realizes that
DECL_STRUCT_FUNCTION (current_function_decl) == NULL.

I'm attaching v3.

Thanks,
Martin

> 
> Otherwise looks good to me.
> 
> Richard.
> 
> 
>> Martin
>>
>>>
 Thanks,
 Martin

>
> Richard.
>
>> Thanks,
>> Martin

>>

>From d299b4c3c10432cda71c0aeb1becc9058ae818c1 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 22 Oct 2015 12:55:00 +0200
Subject: [PATCH] Pass manager: add support for termination of pass list

gcc/ChangeLog:

2015-10-21  Martin Liska  

	* function.c (pop_cfun): Add new condition to checking assert.
	* passes.c (execute_one_pass): Handle TODO_discard_function.
	(execute_pass_list_1): Stop pass manager if a function
	has release body.
	(execute_pass_list): Free dominators info just for functions
	that have gimple body.
	* tree-pass.h (TODO_discard_function): Introduce new TODO.
---
 gcc/function.c  |  1 +
 gcc/passes.c| 19 ++-
 gcc/tree-pass.h |  3 +++
 3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/gcc/function.c b/gcc/function.c
index f774214..a829d71 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -4770,6 +4770,7 @@ pop_cfun (void)
  pop_cfun.  */
   gcc_checking_assert (in_dummy_function
 		   || !cfun
+		   || !gimple_has_body_p (current_function_decl)
 		   || current_function_decl == cfun->decl);
   set

Re: [OpenACC 1/11] UNIQUE internal function

2015-10-22 Thread Julian Brown
On Thu, 22 Oct 2015 10:05:30 +0200
Richard Biener  wrote:

> On Thu, Oct 22, 2015 at 9:59 AM, Jakub Jelinek 
> wrote:
> > On Thu, Oct 22, 2015 at 09:49:29AM +0200, Richard Biener wrote:  
> >> >> Jakub, IYR I originally had IFN_FORK and IFN_JOIN as such
> >> >> distinct internal fns.  This replaces that scheme.
> >> >>
> >> >> ok?  
> >> >
> >> > Hmm, I'd just have used gimple_has_volatile_ops on the call?
> >> > That should have the
> >> > desired effects.  
> >>
> >> That is, whatever new IFNs you need are ok, but special-casing
> >> them is not necessary if you properly mark the calls as volatile.  
> >
> > I don't see gimple_has_volatile_ops used in tracer.c or
> > tree-ssa-threadedge.c.  Setting gimple_has_volatile_ops on those
> > IFNs is fine, but I think they are even stronger than that.  
> 
> Hmm, indeed.  Now I fail to see how the implemented property
> "preserves the CFG looping structure".  And I would have expected
> can_copy_bbs_p to be adjusted instead (catching more cases and the
> threading and tracer case as well).
> 
> As far as I can see nothing would prevent dissolving the loop by
> completely unolling it for example.  Or deleting it because it has no
> side-effects.
> 
> So you'd need to be more precise as to what properties you are trying
> to preserve by placing a single stmt somewhere.

FWIW an earlier, abandoned attempt at solving the same problem was
discussed in the following thread, continuing through June:

  https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02647.html

Though the details of lowering of OpenACC constructs have changed with
Nathan's current patches, the underlying problem remains the same. PTX
requires certain operations (bar.sync) to be executed uniformly by all
threads in a CTA. IIUC this affects "JOIN" points across all
workers/vectors in a gang, in particular (though this is generic code,
other -- particularly GPU -- targets may have similar restrictions).

HTH,

Julian


[PATCH] Fix PR68046 (part of PR61893)

2015-10-22 Thread Richard Biener

This splits out the non-constant folding part of an earlier approved patch 
to make -ftrapv work better.

The constant folding case is still broken (also ubsan doesn't handle it
correctly).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2015-10-22  Richard Biener  

PR middle-end/68046
PR middle-end/61893
* optabs.c (emit_libcall_block_1): Allow a NULL_RTX equiv.
(expand_binop): For -ftrapv optabs do not record an REG_EQUAL note.
(expand_unop): Likewise.

* gcc.dg/torture/ftrapv-2.c: New testcase.

Index: gcc/optabs.c
===
--- gcc/optabs.c(revision 229166)
+++ gcc/optabs.c(working copy)
@@ -1748,11 +1748,12 @@ expand_binop (machine_mode mode, optab b
   insns = get_insns ();
   end_sequence ();
 
+  bool trapv = trapv_binoptab_p (binoptab);
   target = gen_reg_rtx (mode);
   emit_libcall_block_1 (insns, target, value,
-   gen_rtx_fmt_ee (optab_to_code (binoptab),
-   mode, op0, op1),
-   trapv_binoptab_p (binoptab));
+   trapv ? NULL_RTX
+   : gen_rtx_fmt_ee (optab_to_code (binoptab),
+ mode, op0, op1), trapv);
 
   return target;
 }
@@ -2880,13 +2881,19 @@ expand_unop (machine_mode mode, optab un
   end_sequence ();
 
   target = gen_reg_rtx (outmode);
-  eq_value = gen_rtx_fmt_e (optab_to_code (unoptab), mode, op0);
-  if (GET_MODE_SIZE (outmode) < GET_MODE_SIZE (mode))
-   eq_value = simplify_gen_unary (TRUNCATE, outmode, eq_value, mode);
-  else if (GET_MODE_SIZE (outmode) > GET_MODE_SIZE (mode))
-   eq_value = simplify_gen_unary (ZERO_EXTEND, outmode, eq_value, mode);
-  emit_libcall_block_1 (insns, target, value, eq_value,
-   trapv_unoptab_p (unoptab));
+  bool trapv = trapv_unoptab_p (unoptab);
+  if (trapv)
+   eq_value = NULL_RTX;
+  else
+   {
+ eq_value = gen_rtx_fmt_e (optab_to_code (unoptab), mode, op0);
+ if (GET_MODE_SIZE (outmode) < GET_MODE_SIZE (mode))
+   eq_value = simplify_gen_unary (TRUNCATE, outmode, eq_value, mode);
+ else if (GET_MODE_SIZE (outmode) > GET_MODE_SIZE (mode))
+   eq_value = simplify_gen_unary (ZERO_EXTEND,
+  outmode, eq_value, mode);
+   }
+  emit_libcall_block_1 (insns, target, value, eq_value, trapv);
 
   return target;
 }
@@ -3573,7 +3580,8 @@ emit_libcall_block_1 (rtx_insn *insns, r
 }
 
   last = emit_move_insn (target, result);
-  set_dst_reg_note (last, REG_EQUAL, copy_rtx (equiv), target);
+  if (equiv)
+set_dst_reg_note (last, REG_EQUAL, copy_rtx (equiv), target);
 
   if (final_dest != target)
 emit_move_insn (final_dest, target);
Index: gcc/testsuite/gcc.dg/torture/ftrapv-2.c
===
*** gcc/testsuite/gcc.dg/torture/ftrapv-2.c (revision 0)
--- gcc/testsuite/gcc.dg/torture/ftrapv-2.c (working copy)
***
*** 0 
--- 1,32 
+ /* { dg-do run } */
+ /* With -flto this degenerates to constant folding which doesn't work.  */
+ /* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
+ /* { dg-additional-options "-ftrapv" } */
+ /* { dg-require-effective-target trapping } */
+ /* { dg-require-fork unused } */
+ 
+ #include 
+ #include 
+ #include 
+ #include 
+ 
+ /* Verify SImode operations properly trap.  PR middle-end/68046 */
+ 
+ int i = 0x7fff;
+ 
+ int main(void)
+ {
+   pid_t child = fork ();
+   int status = 0;
+   if (child == 0)
+ {
+   volatile int x = i + 1 < i;
+   exit (0);
+ }
+   else if (child == -1)
+ return 0;
+   if (wait (&status) == child 
+   && status == 0)
+ abort ();
+   return 0;
+ }


Re: [PATCH, VECTOR ABI] Add __attribute__((__simd__)) to GCC.

2015-10-22 Thread Kirill Yukhin
Hello,
On 15 Oct 17:47, Kirill Yukhin wrote:
> Hi Jakub,
> On 15 Oct 16:39, Jakub Jelinek wrote:
> > On Thu, Oct 15, 2015 at 05:33:32PM +0300, Kirill Yukhin wrote:
> > > --- a/gcc/doc/extend.texi
> > > +++ b/gcc/doc/extend.texi
> > > @@ -3066,6 +3066,20 @@ This function attribute make a stack protection of 
> > > the function if
> > >  flags @option{fstack-protector} or @option{fstack-protector-strong}
> > >  or @option{fstack-protector-explicit} are set.
> > >  
> > > +@item simd
> > > +@cindex @code{simd} function attribute.
> > > +This attribute enables creation of one or more function versions that
> > > +can process multiple arguments using SIMD instructions from a
> > > +single invocation.  Specifying this attribute allows compiler to
> > > +assume that such a versions are available at link time (provided
> > > +in the same or another translation unit).  Generated versions are
> > > +target dependent and described in corresponding Vector ABI document.  For
> > > +x86_64 target this document can be found
> > > +@w{@uref{https://sourceware.org/glibc/wiki/libmvec?action=AttachFile&do=view&target=VectorABI.txt,here}}.
> > > +It is prohibited to use the attribute along with Cilk Plus's 
> > > @code{vector}
> > > +attribute. If the attribute is specified and @code{#pragma omp declare 
> > > simd}
> > > +presented on a declaration, then the attribute is ignored.
> > 
> > Is that what is implemented?  I mean, isn't the argument ignored only
> > if #pragma omp declare simd is present and -fopenmp or -fopenmp-simd
> > passed on the command line, because otherwise the pragma is ignored?
> You're right. The attribute is ignored if `pragma omp declare simd' is
> got in charge. Which happens only if -fopenmp or -fopenmp-simd is
> are passed. 
> Updated hunk for doc/extend.texi is in the bottom.
Ping?

--
Thanks, K


Re: Move fold_trunc_transparent_mathfn to match.pd

2015-10-22 Thread Richard Biener
On Thu, Oct 22, 2015 at 1:00 PM, Richard Sandiford
 wrote:
> This moves the fold rules for trunc, floor, ceil, round, nearbyint and
> rint in one go, since they're tested as a group.  Most of the code is
> supporting the f(x)->x fold when x is known to be integer-valued.
> Like with the non-negative test, this is probably more elegantly handled
> by tracking range information for reals, but until that happens, I think
> we should handle it analogously to tree_expr_nonnegative_p.
>
> I've incorporated the fix for PR68031 in the new version of
> integer_valued_real_p.  However, it seemed confusing to test for an
> SSA name at the head of the function rather than the case statement,
> and not fall through to tree_simple_nonnegative_warnv_p (which
> conceptually shouldn't care whether an update is in progress).
> But tree_simple_nonnegative_warnv_p is a no-op for SSA names,
> so I simply changed it to:
>
>   return (!name_registered_for_update_p (t)
>   && depth < PARAM_VALUE (PARAM_MAX_SSA_NAME_QUERY_DEPTH)
>   && gimple_stmt_nonnegative_warnv_p (SSA_NAME_DEF_STMT (t),
>   strict_overflow_p, depth));
>
> and used that in the new code too.
>
> Doing these folds later meant that IPA would start to use information
> about the aborting sinf and floor in 20030125-1.c before the folds
> kicked in.  I changed them from noinline to weak to stop that.
>
> Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
> OK to install?
>
> Thanks,
> Richard
>
>
> gcc/
> * builtins.c (integer_valued_real_p): Move to fold-const.c.
> (fold_trunc_transparent_mathfn, fold_builtin_trunc, 
> fold_builtin_floor)
> (fold_builtin_ceil, fold_builtin_round): Delete.
> (fold_builtin_1): Handle constant trunc, floor, ceil and round
> arguments here.
> * fold-const.h (integer_valued_real_unary_p)
> (integer_valued_real_binary_p, integer_valued_real_call_p)
> (integer_valued_real_single_p, integer_valued_real_p): Declare.
> * fold-const.c (tree_single_nonnegative_warnv_p): Move
> name_registered_for_update_p check to SSA_NAME case statement.
> Don't call tree_simple_nonnegative_warnv_p for SSA names.
> (integer_valued_real_unary_p, integer_valued_real_binary_p)
> (integer_valued_real_call_p, integer_valued_real_single_p)
> (integer_valued_real_invalid_p): New functions.
> (integer_valued_real_p): Move from fold-const.c and rework
> to call the functions above.  Handle SSA names.
> * gimple-fold.h (gimple_stmt_integer_valued_real_p): Declare.
> * gimple-fold.c (gimple_assign_integer_valued_real_p)
> (gimple_call_integer_valued_real_p, gimple_phi_integer_valued_real_p)
> (gimple_stmt_integer_valued_real_p): New functions.
> * match.pd: Fold f(f(x))->f(x) for fp->fp rounding functions f.
> Fold f(x)->x for the same f if x is known to be integer-valued.
> Fold f(extend(x))->extend(f'(x)) if doing so doesn't affect
> the result.  Canonicalize floor(x) as trunc(x) if x is
> nonnegative.
>
> gcc/testsuite/
> * gcc.c-torture/execute/20030125-1.c (floor, floorf, sin, sinf):
> Make weak rather than noinline.
> * gcc.dg/builtins-57.c: Compile with -O.
> * gcc.dg/torture/builtin-integral-1.c: Skip for -O0.
>
> diff --git a/gcc/builtins.c b/gcc/builtins.c
> index e5e65ba..c70bbfd 100644
> --- a/gcc/builtins.c
> +++ b/gcc/builtins.c
> @@ -154,16 +154,10 @@ static tree fold_builtin_inf (location_t, tree, int);
>  static tree fold_builtin_nan (tree, tree, int);
>  static tree rewrite_call_expr (location_t, tree, int, tree, int, ...);
>  static bool validate_arg (const_tree, enum tree_code code);
> -static bool integer_valued_real_p (tree);
> -static tree fold_trunc_transparent_mathfn (location_t, tree, tree);
>  static rtx expand_builtin_fabs (tree, rtx, rtx);
>  static rtx expand_builtin_signbit (tree, rtx);
>  static tree fold_builtin_pow (location_t, tree, tree, tree, tree);
>  static tree fold_builtin_powi (location_t, tree, tree, tree, tree);
> -static tree fold_builtin_trunc (location_t, tree, tree);
> -static tree fold_builtin_floor (location_t, tree, tree);
> -static tree fold_builtin_ceil (location_t, tree, tree);
> -static tree fold_builtin_round (location_t, tree, tree);
>  static tree fold_builtin_int_roundingfn (location_t, tree, tree);
>  static tree fold_builtin_bitop (tree, tree);
>  static tree fold_builtin_strchr (location_t, tree, tree, tree);
> @@ -7320,117 +7314,6 @@ fold_builtin_nan (tree arg, tree type, int quiet)
>return build_real (type, real);
>  }
>
> -/* Return true if the floating point expression T has an integer value.
> -   We also allow +Inf, -Inf and NaN to be considered integer values.  */
> -
> -static bool
> -integer_valued_real_p (tree t)
> -{
> -  switch (TREE_CODE (t))
> -{
> -case FLOAT_EXPR

Re: Don't dump low gimple functions in gimple dump

2015-10-22 Thread Jakub Jelinek
On Thu, Jun 04, 2015 at 05:02:42PM +0200, Tom de Vries wrote:
> >So why does add_new_function not do the dumping and to the correct
> >place?  That is,
> >you are dumping things twice here, once in omp expansion and then again when 
> >the
> >new function reaches omp expansion?
> >
> 
> Dumping twice doesn't happen for omp-annotated source code. But indeed,
> dumping twice does happen in ompexpssa for the parloops/ompexpssa case with
> this patch, which is confusing.  And even if we would eliminate the dumping
> when the new function reaches omp expansion, still it would be confusing
> because the dump of the function would not be the version inbetween the
> preceding and succeeding dump.

I've committed following patch to gomp-4_5-branch for this.

The state before the patch is:
1) the omp_fn children created during the pre-SSA ompexp pass are dumped
   first in the *.ssa dump and in all the following ones (these are created
   as low gimple, non-SSA)
2) the omp_cpyfn children created during the pre-SSA ompexp pass are dumped
   first in the *.omplower dump and in all the following ones (these are
   created as high gimple)
3) the omp_fn children created during the parloops/ompexpssa passes
   are dumped first post-IPA (or during IPA?) and in all the following ones
   (these are created as SSA gimple)

2) and 3) is fine, on 1) I don't like the fact that one can see the child
functions only in the SSA form, not before it.  Thus, following patch
arranges for only the 1) case to change, those functions are dumped now
into the *.ompexp dump.  As e.g. for C++ sometimes the dumped function name
is not too descriptive (e.g. ), I've hacked things up so that also
the
;; Function
headers are dumped including the assembler name, and because there can be
many child functions emitted for a single original function, I chose to
dump again the
;; Function
header for the original function if any child functions were dumped.
The original function will have then two headers in the dump file, but I
think it is more readable that way.

2015-10-22  Jakub Jelinek  

* omp-low.c: Include tree-pretty-print.h.
(omp_any_child_fn_dumped): New variable.
(expand_omp_taskreg, expand_omp_target): Call
assign_assembler_name_if_neeeded on child_fn if current_function_decl
has assembler name set, but child_fn does not.  Dump the header
and IL of the child function when not in SSA form.
(expand_omp): Clear omp_any_child_fn_dumped.  Dump function header
again if we have dumped any child functions.

--- gcc/omp-low.c.jj2015-10-22 13:41:23.279881315 +0200
+++ gcc/omp-low.c   2015-10-22 14:11:23.488003543 +0200
@@ -81,6 +81,7 @@ along with GCC; see the file COPYING3.
 #include "context.h"
 #include "lto-section-names.h"
 #include "gomp-constants.h"
+#include "tree-pretty-print.h"
 
 /* Lowering of OMP parallel and workshare constructs proceeds in two
phases.  The first phase scans the function looking for OMP statements
@@ -244,6 +245,7 @@ static int target_nesting_level;
 static struct omp_region *root_omp_region;
 static bitmap task_shared_vars;
 static vec taskreg_contexts;
+static bool omp_any_child_fn_dumped;
 
 static void scan_omp (gimple_seq *, omp_context *);
 static tree scan_omp_1_op (tree *, int *, void *);
@@ -6739,9 +6741,15 @@ expand_omp_taskreg (struct omp_region *r
   node->parallelized_function = 1;
   cgraph_node::add_new_function (child_fn, true);
 
+  bool need_asm = DECL_ASSEMBLER_NAME_SET_P (current_function_decl)
+ && !DECL_ASSEMBLER_NAME_SET_P (child_fn);
+
   /* Fix the callgraph edges for child_cfun.  Those for cfun will be
 fixed in a following pass.  */
   push_cfun (child_cfun);
+  if (need_asm)
+   assign_assembler_name_if_neeeded (child_fn);
+
   if (optimize)
optimize_omp_library_calls (entry_stmt);
   cgraph_edge::rebuild_edges ();
@@ -6767,6 +6775,13 @@ expand_omp_taskreg (struct omp_region *r
verify_loop_structure ();
 #endif
   pop_cfun ();
+
+  if (dump_file && !gimple_in_ssa_p (cfun))
+   {
+ omp_any_child_fn_dumped = true;
+ dump_function_header (dump_file, child_fn, dump_flags);
+ dump_function_to_file (child_fn, dump_file, dump_flags);
+   }
 }
 
   /* Emit a library call to launch the children threads.  */
@@ -11575,9 +11590,14 @@ expand_omp_target (struct omp_region *re
   vec_safe_push (offload_funcs, child_fn);
 #endif
 
+  bool need_asm = DECL_ASSEMBLER_NAME_SET_P (current_function_decl)
+ && !DECL_ASSEMBLER_NAME_SET_P (child_fn);
+
   /* Fix the callgraph edges for child_cfun.  Those for cfun will be
 fixed in a following pass.  */
   push_cfun (child_cfun);
+  if (need_asm)
+   assign_assembler_name_if_neeeded (child_fn);
   cgraph_edge::rebuild_edges ();
 
 #ifdef ENABLE_OFFLOADING
@@ -11605,6 +11625,13 @@ expand_omp_target (struct omp_region *

Re: [0/7] Type promotion pass and elimination of zext/sext

2015-10-22 Thread Richard Biener
On Wed, Oct 21, 2015 at 7:55 PM, Richard Henderson  wrote:
> On 10/21/2015 03:56 AM, Richard Biener wrote:
>>
>> On Wed, Oct 21, 2015 at 2:45 PM, Richard Biener
>>  wrote:
>>>
>>> On Tue, Oct 20, 2015 at 10:03 PM, Kugan
>>>  wrote:



 On 07/09/15 12:53, Kugan wrote:
>
>
> This a new version of the patch posted in
> https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00226.html. I have done
> more testing and spitted the patch to make it more easier to review.
> There are still couple of issues to be addressed and I am working on
> them.
>
> 1. AARCH64 bootstrap now fails with the commit
> 94f92c36a83d66a893c3bc6f00a038ba3dbe2a6f. simplify-rtx.c is
> mis-compiled
> in stage2 and fwprop.c is failing. It looks to me that there is a
> latent
> issue which gets exposed my patch. I can also reproduce this in x86_64
> if I use the same PROMOTE_MODE which is used in aarch64 port. For the
> time being, I am using  patch
> 0006-temporary-workaround-for-bootstrap-failure-due-to-co.patch as a
> workaround. This meeds to be fixed before the patches are ready to be
> committed.
>
> 2. vector-compare-1.c from c-c++-common/torture fails to assemble with
> -O3 -g Error: unaligned opcodes detected in executable segment. It
> works
> fine if I remove the -g. I am looking into it and needs to be fixed as
> well.


 Hi Richard,

 Now that stage 1 is going to close, I would like to get these patches
 accepted for stage1. I will try my best to address your review comments
 ASAP.
>>>
>>>
>>> Ok, can you make the whole patch series available so I can poke at the
>>> implementation a bit?  Please state the revision it was rebased on
>>> (or point me to a git/svn branch the work resides on).
>>>
 * Issue 1 above (AARCH64 bootstrap now fails with the commit) is no
 longer present as it is fixed in trunk. Patch-6 is no longer needed.

 * Issue 2 is also reported as known issue

 *  Promotion of PARM_DECLs and RESULT_DECLs in IPA pass and patterns in
 match.pd for SEXT_EXPR, I would like to propose them as a follow up
 patch once this is accepted.
>>>
>>>
>>> I thought more about this and don't think it can be made work without a
>>> lot of
>>> hassle.  Instead to get rid of the remaining "badly" typed registers in
>>> the
>>> function we can key different type requirements on a pass property
>>> (PROP_promoted_regs), thus simply change the expectation of the
>>> types of function parameters / results according to their promotion.
>>
>>
>> Or maybe we should simply make GIMPLE _always_ adhere to the ABI
>> details from the start (gimplification).  Note that this does not only
>> involve
>> PROMOTE_MODE.  Note that for what GIMPLE is concerned I'd only
>> "lower" passing / returning in registers (whee, and then we have
>> things like targetm.calls.split_complex_arg ... not to mention passing
>> GIMPLE memory in registers).
>>
>> Maybe I'm shooting too far here in the attempt to make GIMPLE closer
>> to the target (to expose those redundant extensions on GIMPLE) and
>> we'll end up with a bigger mess than with not doing this?
>
>
> I'm leary of building this in as early as gimplification, lest we get into
> trouble with splitting out bits of the current function for off-loading.
> What happens when the cpu and gpu have different promotion rules?

Ah, of course.  I tend to forget these issues.

Richard.

>
> r~


Re: [PATCH, VECTOR ABI] Add __attribute__((__simd__)) to GCC.

2015-10-22 Thread Joseph Myers
On Thu, 22 Oct 2015, Kirill Yukhin wrote:

> Ping?

You need to update this patch to take account of Marek's fix for bug 67964 
(it was because I was suspicious of the "continue;" in this patch 
accepting invalid syntax that I found that bug), retest and resubmit.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [OpenACC 7/11] execution model

2015-10-22 Thread Nathan Sidwell

On 10/22/15 05:23, Jakub Jelinek wrote:

On Wed, Oct 21, 2015 at 03:42:26PM -0400, Nathan Sidwell wrote:

+/*  Flags for an OpenACC loop.  */
+
+enum oacc_loop_flags
+  {


Weird formatting.  I see either


Blame emacs (I thought  it was configured for GNU formatting ...)



+  expr = build2 (TRUNC_MOD_EXPR, ivar_type, ivar,
+fold_convert (ivar_type, collapse->iters));
+  expr = build2 (MULT_EXPR, diff_type, fold_convert (diff_type, expr),
+collapse->step);
+  expr = build2 (plus_code, iter_type, collapse->base,
+fold_convert (plus_type, expr));


Shouldn't these be fold_build2 instead?


I don't think fold_build2 makes a difference here, as (at least) one operand is 
 a variable?



+/* FORK and JOIN mark the points at which OpenACC partitioned
+   execution is entered or exited.  They take an INTEGER_CST argument,
+   indicating the axis of forking or joining and return nothing.  */
+#define IFN_UNIQUE_OACC_FORK 1
+#define IFN_UNIQUE_OACC_JOIN 2
+/* HEAD_MARK and TAIL_MARK are used to demark the sequence entering or
+   leaving partitioned execution.  */
+#define IFN_UNIQUE_OACC_HEAD_MARK 3
+#define IFN_UNIQUE_OACC_TAIL_MARK 4


Shouldn't these be in an enum, to make debugging easier?


internal-fn.def can be included multiple times in one file (probably only 
internal-fn.c).  Thus an enum would either need to go somewhere else (and I'd 
like to keep it close to the ifn def), or need to be protected in some manner. 
Hence I went with #defs, which are safe to duplicate.  Any thoughts on how to 
resolve that contention?


nathan



Re: Don't dump low gimple functions in gimple dump

2015-10-22 Thread Tom de Vries

On 22/10/15 14:27, Jakub Jelinek wrote:

The state before the patch is:
1) the omp_fn children created during the pre-SSA ompexp pass are dumped
first in the *.ssa dump and in all the following ones (these are created
as low gimple, non-SSA)


Hi,

I do see those child fns before the ssa dump, in the fixup_cfg1 dump 
(and with -fipa-tree-all, also in the visibility dump, just after the 
ompexp dump).


[ Not that I disagree with dumping the child fn in ompexp dump. ]


2) the omp_cpyfn children created during the pre-SSA ompexp pass are dumped
first in the *.omplower dump and in all the following ones (these are
created as high gimple)
3) the omp_fn children created during the parloops/ompexpssa passes
are dumped first post-IPA (or during IPA?) and in all the following ones
(these are created as SSA gimple)




3) is fine


Not fine for me though.

As I wrote earlier ( 
https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01316.html ):

...
The problems I'm trying to solve with this patch are the following:
- even if you're compiling a single function, part of the function
  occurs twice (once in the original, once in the split-off
  function), making formulating scan-tree-dumps more complicated for
  parloops.
- the intuitive thing is to look in the parloops tree-dump and look at
  the original function and the split-off function, and think that the
  split-off function is the immediate result of the split-off that
  happened in the pass, which is incorrect (since it jumps back in the
  pass list and traverses other passes before arriving back at
  parloops).

Adding something like a flag -fno-dump-new-function would solve the 
first problem, but not the second.


We could also dump new functions in a different dump file 
src.c.new.123t.pass, and this would solve both problems. But it will 
cause the 'where did that function go' confusion, certainly initially.

...

Thanks,
- Tom



Re: [OpenACC 8/11] device-specific lowering

2015-10-22 Thread Nathan Sidwell

On 10/22/15 05:31, Jakub Jelinek wrote:

On Wed, Oct 21, 2015 at 03:49:08PM -0400, Nathan Sidwell wrote:



So, how do you expand the OACC loops on non-PTX devices (host, or say
XeonPhi)?  Do you drop the IFNs and replace stuff with normal loops?


On a non ptx target (canonical example being the host), the IFN head/tail 
markers get deleted. the IFN_LOOP builtin gets expanded to code that essentially 
restores the original loop structure.



I don't see anything that would e.g. set the various flags that e.g. OpenMP
#pragma omp simd or Cilk+ #pragma simd sets, like loop->safelen,
loop->force_vectorize, maybe loop->simduid and promote some vars to simduid
arrays if that is relevant to OpenACC.


It won't convert them into such representations.

nathan


Re: [OpenACC 7/11] execution model

2015-10-22 Thread Jakub Jelinek
On Thu, Oct 22, 2015 at 08:50:23AM -0400, Nathan Sidwell wrote:
> >>+  expr = build2 (TRUNC_MOD_EXPR, ivar_type, ivar,
> >>+fold_convert (ivar_type, collapse->iters));
> >>+  expr = build2 (MULT_EXPR, diff_type, fold_convert (diff_type, expr),
> >>+collapse->step);
> >>+  expr = build2 (plus_code, iter_type, collapse->base,
> >>+fold_convert (plus_type, expr));
> >
> >Shouldn't these be fold_build2 instead?
> 
> I don't think fold_build2 makes a difference here, as (at least) one operand
> is  a variable?

Fold does tons of optimizations, some could be relevant even for this case.
But if you think build2 is fine, I can live with it.

> >>+/* FORK and JOIN mark the points at which OpenACC partitioned
> >>+   execution is entered or exited.  They take an INTEGER_CST argument,
> >>+   indicating the axis of forking or joining and return nothing.  */
> >>+#define IFN_UNIQUE_OACC_FORK 1
> >>+#define IFN_UNIQUE_OACC_JOIN 2
> >>+/* HEAD_MARK and TAIL_MARK are used to demark the sequence entering or
> >>+   leaving partitioned execution.  */
> >>+#define IFN_UNIQUE_OACC_HEAD_MARK 3
> >>+#define IFN_UNIQUE_OACC_TAIL_MARK 4
> >
> >Shouldn't these be in an enum, to make debugging easier?
> 
> internal-fn.def can be included multiple times in one file (probably only
> internal-fn.c).  Thus an enum would either need to go somewhere else (and
> I'd like to keep it close to the ifn def), or need to be protected in some
> manner. Hence I went with #defs, which are safe to duplicate.  Any thoughts
> on how to resolve that contention?

I think an enum in internal-fn.h is better, the IFN_UNIQUE comment can just
say that it uses this and this enum from internal-fn.h and the description
go there.  Being able to just p (enum ifn_unique_kind) gimple_call_arg (call, 0)
is valuable (though, perhaps you could even tweak the gimple-pretty-print.c
dumper to dump the first argument to IFN_UNIQUE symbolically instead of
numerically.

Jakub


Re: [OpenACC 1/11] UNIQUE internal function

2015-10-22 Thread Nathan Sidwell

On 10/22/15 04:07, Richard Biener wrote:

On Thu, Oct 22, 2015 at 10:04 AM, Jakub Jelinek  wrote:



Do you have to scan the whole bb?  E.g. don't or should not those
unique IFNs force end of bb?


Yeah, please make them either end or start a BB so we have to check
at most a single stmt.  ECF_RETURNS_TWICE should achieve that,
it also makes it a code motion barrier.


Thanks, I'd not thought of doing it like that.  Will try.

nathan


Re: [OpenACC 2/11] PTX backend changes

2015-10-22 Thread Nathan Sidwell

On 10/22/15 05:55, Bernd Schmidt wrote:

On 10/22/2015 10:12 AM, Jakub Jelinek wrote:



So, is the worker broadcast buffer effectively a file scope .shared
variable?  My worry is that as .shared is quite limited resource, if you
compile many TUs and each allocates its own broadcast buffer you run out of
shared memory.  Is there any way how to share the broadcast buffers in
between different TUs (other than LTO)?


I think LTO is the mechanism, nvptx-lto1 only ever produces one assembly file.
So I'm not really concerned about this.


Correct.  PTX has no equivalent of common or weak, so we can't do the elf thing 
of emitting a common defn and having the linking process pick the largest.




One other thing about this occurred to me yesterday - I was worried about
thread-safety with a single static buffer - couldn't code execute multiple
kernels at the same time? I googled a bit, and could not actually find a
definitive answer as to whether all shared memory is allocated at kernel launch,
or just the dynamic portion?


AFAICT a single CTA doesn't execute multiple kernels concurrently.

nathan


Re: [OpenACC 7/11] execution model

2015-10-22 Thread Nathan Sidwell

On 10/22/15 08:59, Jakub Jelinek wrote:

On Thu, Oct 22, 2015 at 08:50:23AM -0400, Nathan Sidwell wrote:

+  expr = build2 (TRUNC_MOD_EXPR, ivar_type, ivar,
+fold_convert (ivar_type, collapse->iters));
+  expr = build2 (MULT_EXPR, diff_type, fold_convert (diff_type, expr),
+collapse->step);
+  expr = build2 (plus_code, iter_type, collapse->base,
+fold_convert (plus_type, expr));


Shouldn't these be fold_build2 instead?


I don't think fold_build2 makes a difference here, as (at least) one operand
is  a variable?


Fold does tons of optimizations, some could be relevant even for this case.
But if you think build2 is fine, I can live with it.


I don;t mind.  I just thought it would do a lot of checking and no actual 
transformation.



I think an enum in internal-fn.h is better, the IFN_UNIQUE comment can just
say that it uses this and this enum from internal-fn.h and the description
go there.  Being able to just p (enum ifn_unique_kind) gimple_call_arg (call, 0)
is valuable (though, perhaps you could even tweak the gimple-pretty-print.c
dumper to dump the first argument to IFN_UNIQUE symbolically instead of
numerically.


ok

nathan


Re: [OpenACC 1/11] UNIQUE internal function

2015-10-22 Thread Nathan Sidwell

On 10/22/15 07:10, Julian Brown wrote:

On Thu, 22 Oct 2015 10:05:30 +0200
Richard Biener  wrote:



So you'd need to be more precise as to what properties you are trying
to preserve by placing a single stmt somewhere.


FWIW an earlier, abandoned attempt at solving the same problem was
discussed in the following thread, continuing through June:

   https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02647.html

Though the details of lowering of OpenACC constructs have changed with
Nathan's current patches, the underlying problem remains the same. PTX
requires certain operations (bar.sync) to be executed uniformly by all
threads in a CTA. IIUC this affects "JOIN" points across all
workers/vectors in a gang, in particular (though this is generic code,
other -- particularly GPU -- targets may have similar restrictions).



Richard, does  this answer your question?

nathan


Re: Don't dump low gimple functions in gimple dump

2015-10-22 Thread Jakub Jelinek
On Thu, Oct 22, 2015 at 02:51:34PM +0200, Tom de Vries wrote:
> On 22/10/15 14:27, Jakub Jelinek wrote:
> >The state before the patch is:
> >1) the omp_fn children created during the pre-SSA ompexp pass are dumped
> >first in the *.ssa dump and in all the following ones (these are created
> >as low gimple, non-SSA)
> 
> Hi,
> 
> I do see those child fns before the ssa dump, in the fixup_cfg1 dump (and
> with -fipa-tree-all, also in the visibility dump, just after the ompexp
> dump).

Oh, indeed.  Any case, I think, with the patch 1) behaves the most desirable
way now, no dumps before ompexp show it, and ompexp dump shows both the new
child functions and original function with the outlined code already
removed.  And all dumps after ompexp show both as well.

> [ Not that I disagree with dumping the child fn in ompexp dump. ]
> 
> >2) the omp_cpyfn children created during the pre-SSA ompexp pass are dumped
> >first in the *.omplower dump and in all the following ones (these are
> >created as high gimple)

And this is probably fine too, as the cpyfn functions are artificially
created, they don't contain code that has been moved from somewhere else.

> >3) the omp_fn children created during the parloops/ompexpssa passes
> >are dumped first post-IPA (or during IPA?) and in all the following ones
> >(these are created as SSA gimple)

Bet what you'd want most would be to be able to create new functions and
mark them not just as SSA low gimple, but also be able to say, skip
everything in the pass pipeline for these functions until pass this and
this, i.e. start pass pipeline with pass->next of the creating pass.
Then it would neither show up in the earlier dumps, nor be processed
multiple times by the same passes.  We'd need the && !gimple_in_ssa_p
I've added in the patch removed once that is done...

Jakub


Re: [OpenACC 3/11] new target hook

2015-10-22 Thread Nathan Sidwell

On 10/22/15 04:15, Jakub Jelinek wrote:

On Wed, Oct 21, 2015 at 03:13:26PM -0400, Nathan Sidwell wrote:



+/* Determine whether fork & joins are needed.  */
+
+static bool
+nvptx_xform_fork_join (gcall *call, const int dims[],
+  bool ARG_UNUSED (is_fork))


Why is this not called nvptx_goacc_fork_join when that is the name of
the target hook?


don't want to make  things to clear ... (will fix).

nathan


Re: [OpenACC 1/11] UNIQUE internal function

2015-10-22 Thread Jakub Jelinek
On Thu, Oct 22, 2015 at 09:08:30AM -0400, Nathan Sidwell wrote:
> On 10/22/15 07:10, Julian Brown wrote:
> >On Thu, 22 Oct 2015 10:05:30 +0200
> >Richard Biener  wrote:
> 
> >>So you'd need to be more precise as to what properties you are trying
> >>to preserve by placing a single stmt somewhere.
> >
> >FWIW an earlier, abandoned attempt at solving the same problem was
> >discussed in the following thread, continuing through June:
> >
> >   https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02647.html
> >
> >Though the details of lowering of OpenACC constructs have changed with
> >Nathan's current patches, the underlying problem remains the same. PTX
> >requires certain operations (bar.sync) to be executed uniformly by all
> >threads in a CTA. IIUC this affects "JOIN" points across all
> >workers/vectors in a gang, in particular (though this is generic code,
> >other -- particularly GPU -- targets may have similar restrictions).
> 
> 
> Richard, does  this answer your question?

I agree with Richard that it would be better to write more about what kind
of IL changes are acceptable with IFN_UNIQUE in the IL and what are not.
E.g. is inlining ok (I'd hope yes)?  Is function splitting ok (bet as long
as all IFN_UNIQUE calls stay in one or the other part, but not both)?
Various loop optimization, ...

Jakub


Re: [OpenACC 2/11] PTX backend changes

2015-10-22 Thread Nathan Sidwell

On 10/22/15 09:01, Nathan Sidwell wrote:

On 10/22/15 05:55, Bernd Schmidt wrote:

On 10/22/2015 10:12 AM, Jakub Jelinek wrote:



So, is the worker broadcast buffer effectively a file scope .shared
variable?  My worry is that as .shared is quite limited resource, if you
compile many TUs and each allocates its own broadcast buffer you run out of
shared memory.  Is there any way how to share the broadcast buffers in
between different TUs (other than LTO)?


I think LTO is the mechanism, nvptx-lto1 only ever produces one assembly file.
So I'm not really concerned about this.


Correct.  PTX has no equivalent of common or weak, so we can't do the elf thing
of emitting a common defn and having the linking process pick the largest.


oh, and I even thought of having a bunch of defns in a library of the form
long worker_buf_:

and then having the emitted code reference the set that it needed so that the 
linker would concatenate them into a single object.  But PTX has no concept of 
sections, so couldn't gatheer those decls into contiguous memory,


nathan


[PATCH] [AArch64] support -mfentry feature for arm64

2015-10-22 Thread Li Bin
From: Jiangjiji 

* gcc/config/aarch64/aarch64.opt: Add a new option.
* gcc/config/aarch64/aarch64.c: Add some new functions and Macros.
* gcc/config/aarch64/aarch64.h: Modify PROFILE_HOOK and FUNCTION_PROFILER.

Signed-off-by: Jiangjiji 
Signed-off-by: Li Bin 
---
 gcc/config/aarch64/aarch64.c   |   23 +++
 gcc/config/aarch64/aarch64.h   |   13 -
 gcc/config/aarch64/aarch64.opt |4 
 3 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 752df4e..c70b161 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -440,6 +440,17 @@ aarch64_is_long_call_p (rtx sym)
   return aarch64_decl_is_long_call_p (SYMBOL_REF_DECL (sym));
 }
 
+void
+aarch64_function_profiler (FILE *file, int labelno ATTRIBUTE_UNUSED)
+{
+   if (flag_fentry)
+   {
+   fprintf (file, "\tmov\tx9, x30\n");
+   fprintf (file, "\tbl\t__fentry__\n");
+   fprintf (file, "\tmov\tx30, x9\n");
+   }
+}
+
 /* Return true if the offsets to a zero/sign-extract operation
represent an expression that matches an extend operation.  The
operands represent the paramters from
@@ -7414,6 +7425,15 @@ aarch64_emit_unlikely_jump (rtx insn)
   add_int_reg_note (insn, REG_BR_PROB, very_unlikely);
 }
 
+/* Return true, if profiling code should be emitted before
+ * prologue. Otherwise it returns false.
+ * Note: For x86 with "hotfix" it is sorried.  */
+static bool
+aarch64_profile_before_prologue (void)
+{
+   return flag_fentry != 0;
+}
+
 /* Expand a compare and swap pattern.  */
 
 void
@@ -8454,6 +8474,9 @@ aarch64_cannot_change_mode_class (enum machine_mode from,
 #undef TARGET_ASM_ALIGNED_SI_OP
 #define TARGET_ASM_ALIGNED_SI_OP "\t.word\t"
 
+#undef TARGET_PROFILE_BEFORE_PROLOGUE
+#define TARGET_PROFILE_BEFORE_PROLOGUE aarch64_profile_before_prologue
+
 #undef TARGET_ASM_CAN_OUTPUT_MI_THUNK
 #define TARGET_ASM_CAN_OUTPUT_MI_THUNK \
   hook_bool_const_tree_hwi_hwi_const_tree_true
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 77b2bb9..65e34fc 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -804,13 +804,16 @@ do {  
 \
 #define PROFILE_HOOK(LABEL)\
   {\
 rtx fun, lr;   \
-lr = get_hard_reg_initial_val (Pmode, LR_REGNUM);  \
-fun = gen_rtx_SYMBOL_REF (Pmode, MCOUNT_NAME); \
-emit_library_call (fun, LCT_NORMAL, VOIDmode, 1, lr, Pmode);   \
+   if (!flag_fentry)
+ {
+   lr = get_hard_reg_initial_val (Pmode, LR_REGNUM);   
\
+   fun = gen_rtx_SYMBOL_REF (Pmode, MCOUNT_NAME);  
\
+   emit_library_call (fun, LCT_NORMAL, VOIDmode, 1, lr, Pmode);
\
+ }
   }
 
-/* All the work done in PROFILE_HOOK, but still required.  */
-#define FUNCTION_PROFILER(STREAM, LABELNO) do { } while (0)
+#define FUNCTION_PROFILER(STREAM, LABELNO)
+   aarch64_function_profiler(STREAM, LABELNO)
 
 /* For some reason, the Linux headers think they know how to define
these macros.  They don't!!!  */
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 266d873..9e4b408 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -124,3 +124,7 @@ Enum(aarch64_abi) String(ilp32) Value(AARCH64_ABI_ILP32)
 
 EnumValue
 Enum(aarch64_abi) String(lp64) Value(AARCH64_ABI_LP64)
+
+mfentry
+Target Report Var(flag_fentry) Init(0)
+Emit profiling counter call at function entry immediately after prologue.
-- 
1.7.1



Re: [OpenACC 1/11] UNIQUE internal function

2015-10-22 Thread Nathan Sidwell

On 10/22/15 09:17, Jakub Jelinek wrote:

On Thu, Oct 22, 2015 at 09:08:30AM -0400, Nathan Sidwell wrote:



I agree with Richard that it would be better to write more about what kind
of IL changes are acceptable with IFN_UNIQUE in the IL and what are not.
E.g. is inlining ok (I'd hope yes)?  Is function splitting ok (bet as long
as all IFN_UNIQUE calls stay in one or the other part, but not both)?


Essentially, yes.  a set of IFN_UNIQUE form a group  which must not be separated 
 from each other.  The set is discovered implicitly by following the CFG 
(though I suppose we could add an identifying INT_CST operand or something 
equivalent).


nathan


[PATCH] Use strided stores when interleaving fails

2015-10-22 Thread Richard Biener

This makes the vectorizer use strided accesses when single-element
interleaving fails.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2015-10-22  Richard Biener  

PR tree-optimization/19049
PR tree-optimization/65962
* tree-vect-data-refs.c (vect_analyze_group_access_1): Fall back
to strided accesses if single-element interleaving doesn't work.

* gcc.dg/vect/vect-strided-store-pr65962.c: New testcase.
* gcc.dg/vect/vect-63.c: Adjust.
* gcc.dg/vect/vect-70.c: Likewise.
* gcc.dg/vect/vect-strided-u8-i2-gap.c: Likewise.
* gcc.dg/vect/vect-strided-a-u8-i2-gap.c: Likewise.
* gfortran.dg/vect/pr19049.f90: Likewise.
* gfortran.dg/vect/vect-8.f90: Likewise.

Index: gcc/tree-vect-data-refs.c
===
*** gcc/tree-vect-data-refs.c   (revision 229167)
--- gcc/tree-vect-data-refs.c   (working copy)
*** vect_analyze_group_access_1 (struct data
*** 2114,2120 
  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
   "not consecutive access ");
  dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0);
- dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
  }
  
if (bb_vinfo)
--- 2136,2141 
*** vect_analyze_group_access_1 (struct data
*** 2124,2130 
return true;
  }
  
!   return false;
  }
  
if (GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) == stmt)
--- 2145,2153 
return true;
  }
  
!   dump_printf_loc (MSG_NOTE, vect_location, "using strided accesses\n");
!   STMT_VINFO_STRIDED_P (stmt_info) = true;
!   return true;
  }
  
if (GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) == stmt)
Index: gcc/testsuite/gcc.dg/vect/vect-strided-store-pr65962.c
===
*** gcc/testsuite/gcc.dg/vect/vect-strided-store-pr65962.c  (revision 0)
--- gcc/testsuite/gcc.dg/vect/vect-strided-store-pr65962.c  (working copy)
***
*** 0 
--- 1,14 
+ /* { dg-do compile } */
+ /* { dg-require-effective-target vect_int } */
+ 
+ void
+ loop (int *data)
+ {
+   for (int i = 0; i < 256; i++)
+ data[i * 2] += 7;
+ }
+ 
+ /* As we can't use interleaving for the store with gaps we have to
+use strided stores.  */
+ 
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-63.c
===
*** gcc/testsuite/gcc.dg/vect/vect-63.c (revision 229167)
--- gcc/testsuite/gcc.dg/vect/vect-63.c (working copy)
*** int main1 ()
*** 13,19 
int ia[N*2][4][N];
  
/* Multidimensional array. Aligned. 
!  The first dimension depends on j: not vectorizable. */
for (i = 0; i < N; i++)
  {
for (j = 0; j < N; j++)
--- 13,19 
int ia[N*2][4][N];
  
/* Multidimensional array. Aligned. 
!  The first dimension depends on j: use strided stores. */
for (i = 0; i < N; i++)
  {
for (j = 0; j < N; j++)
*** int main (void)
*** 42,45 
return main1 ();
  }
  
! /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail 
*-*-* } } } */
--- 42,45 
return main1 ();
  }
  
! /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-70.c
===
*** gcc/testsuite/gcc.dg/vect/vect-70.c (revision 229167)
--- gcc/testsuite/gcc.dg/vect/vect-70.c (working copy)
*** int main1 ()
*** 37,43 
abort ();
  }

!   /* not consecutive */
for (i = 0; i < N; i++)
  for (j = 3; j < N-3; j++)
{ 
--- 37,43 
abort ();
  }

!   /* not consecutive, will use strided stores */
for (i = 0; i < N; i++)
  for (j = 3; j < N-3; j++)
{ 
*** int main (void)
*** 62,68 
return main1 ();
  }

! /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
  /* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 0 
"vect" } } */
  /* { dg-final { scan-tree-dump-times "Alignment of access forced using 
peeling" 1 "vect" {target { vector_alignment_reachable} } } } */
  /* { dg-final { scan-tree-dump-times "Alignment of access forced using 
versioning" 1 "vect" {target {{! vector_alignment_reachable} && {! 
vect_hw_misalign} } } } } */
--- 62,68 
return main1 ();
  }

! /* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
  /* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 0 
"vect" } } */
  /* { dg-final { scan-tree-dump-times "Alignment of access forced using 
peeling" 1 "vect" {target { vector_alignment_reachable} } } } */
  /* { dg

[PATCH] Mitigate PR58497

2015-10-22 Thread Richard Biener

I've had a patch in my dev tree for quite a while that lowers uniform
vector stmts to scalar stmts.  This also mitigates PR58497 so I decided
to push it out now.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-10-22  Richard Biener  

PR tree-optimization/58497
* tree-vect-generic.c (ssa_uniform_vector_p): New helper.
(expand_vector_operations_1): Use it.  Lower operations on
all uniform vectors to scalar operations if the HW supports it.

* gcc.dg/tree-ssa/vector-5.c: New testcase.

Index: gcc/tree-vect-generic.c
===
*** gcc/tree-vect-generic.c (revision 229167)
--- gcc/tree-vect-generic.c (working copy)
*** lower_vec_perm (gimple_stmt_iterator *gs
*** 1339,1344 
--- 1339,1361 
update_stmt (gsi_stmt (*gsi));
  }
  
+ /* If OP is a uniform vector return the element it is a splat from.  */
+ 
+ static tree
+ ssa_uniform_vector_p (tree op)
+ {
+   if (TREE_CODE (op) == VECTOR_CST
+   || TREE_CODE (op) == CONSTRUCTOR)
+ return uniform_vector_p (op);
+   if (TREE_CODE (op) == SSA_NAME)
+ {
+   gimple *def_stmt = SSA_NAME_DEF_STMT (op);
+   if (gimple_assign_single_p (def_stmt))
+   return uniform_vector_p (gimple_assign_rhs1 (def_stmt));
+ }
+   return NULL_TREE;
+ }
+ 
  /* Return type in which CODE operation with optab OP can be
 computed.  */
  
*** expand_vector_operations_1 (gimple_stmt_
*** 1505,1510 
--- 1522,1550 
if (TREE_CODE (type) != VECTOR_TYPE)
  return;
  
+   /* If the vector operation is operating on all same vector elements
+  implement it with a scalar operation and a splat if the target
+  supports the scalar operation.  */
+   tree srhs1, srhs2 = NULL_TREE;
+   if ((srhs1 = ssa_uniform_vector_p (rhs1)) != NULL_TREE
+   && (rhs2 == NULL_TREE
+ || (srhs2 = ssa_uniform_vector_p (rhs2)) != NULL_TREE)
+   /* As we query direct optabs restrict to non-convert operations.  */
+   && TYPE_MODE (TREE_TYPE (type)) == TYPE_MODE (TREE_TYPE (srhs1)))
+ {
+   op = optab_for_tree_code (code, TREE_TYPE (type), optab_scalar);
+   if (optab_handler (op, TYPE_MODE (TREE_TYPE (type))) != 
CODE_FOR_nothing)
+   {
+ tree slhs = make_ssa_name (TREE_TYPE (srhs1));
+ gimple *repl = gimple_build_assign (slhs, code, srhs1, srhs2);
+ gsi_insert_before (gsi, repl, GSI_SAME_STMT);
+ gimple_assign_set_rhs_from_tree (gsi,
+  build_vector_from_val (type, slhs));
+ update_stmt (stmt);
+ return;
+   }
+ }
+  
/* A scalar operation pretending to be a vector one.  */
if (VECTOR_BOOLEAN_TYPE_P (type)
&& !VECTOR_MODE_P (TYPE_MODE (type))
*** expand_vector_operations_1 (gimple_stmt_
*** 1554,1568 
if (VECTOR_INTEGER_TYPE_P (TREE_TYPE (rhs2)))
  {
tree first;
- gimple *def_stmt;
  
!   if ((TREE_CODE (rhs2) == VECTOR_CST
!  && (first = uniform_vector_p (rhs2)) != NULL_TREE)
! || (TREE_CODE (rhs2) == SSA_NAME
! && (def_stmt = SSA_NAME_DEF_STMT (rhs2))
! && gimple_assign_single_p (def_stmt)
! && (first = uniform_vector_p
! (gimple_assign_rhs1 (def_stmt))) != NULL_TREE))
  {
gimple_assign_set_rhs2 (stmt, first);
update_stmt (stmt);
--- 1594,1601 
if (VECTOR_INTEGER_TYPE_P (TREE_TYPE (rhs2)))
  {
tree first;
  
!   if ((first = ssa_uniform_vector_p (rhs2)) != NULL_TREE)
  {
gimple_assign_set_rhs2 (stmt, first);
update_stmt (stmt);
Index: gcc/testsuite/gcc.dg/tree-ssa/vector-5.c
===
*** gcc/testsuite/gcc.dg/tree-ssa/vector-5.c(revision 0)
--- gcc/testsuite/gcc.dg/tree-ssa/vector-5.c(working copy)
***
*** 0 
--- 1,15 
+ /* { dg-do compile } */
+ /* { dg-options "-O -fdump-tree-optimized" } */
+ 
+ typedef int v4si __attribute__((vector_size(4*sizeof (int;
+ 
+ v4si v;
+ int foo (int i)
+ {
+   v4si v1 = (v4si) { i, i, i, i };
+   v4si v2 = (v4si) { 3, 3, 3, 3 };
+   v = v1 * v2;
+ }
+ 
+ /* The operation should be carried out as scalar op.  */
+ /* { dg-final { scan-tree-dump-times " \* 3;" 1 "optimized" } } */


Re: [PATCH] [AArch64] support -mfentry feature for arm64

2015-10-22 Thread Marcus Shawcroft
On 22 October 2015 at 14:21, Li Bin  wrote:
> From: Jiangjiji 
>
> * gcc/config/aarch64/aarch64.opt: Add a new option.
> * gcc/config/aarch64/aarch64.c: Add some new functions and Macros.
> * gcc/config/aarch64/aarch64.h: Modify PROFILE_HOOK and FUNCTION_PROFILER.
>
> Signed-off-by: Jiangjiji 
> Signed-off-by: Li Bin 

Hi,  Please work with Maxim
https://gcc.gnu.org/ml/gcc/2015-10/msg00090.html towards a generic
solution to this issue.  If, in the future it becomes apparent that
his proposal is not viable then we can reconsider a backend centric
solution.

Cheers
/Marcus


Re: [OpenACC 11/11] execution tests

2015-10-22 Thread Nathan Sidwell

On 10/22/15 05:37, Jakub Jelinek wrote:


And, I must say I'm at least missing testcases that check parsing but also
runtime behavior of the vector or worker clause arguments (there
is one gang (static:1) clause, but not the other clauses nor other styles of
gang arguments.


the static clause is only valid on gang.  But you're right, some error tests 
would be good to include in this patch set.


nathan


Re: [OpenACC 1/11] UNIQUE internal function

2015-10-22 Thread Nathan Sidwell

On 10/22/15 04:07, Richard Biener wrote:


Yeah, please make them either end or start a BB so we have to check
at most a single stmt.  ECF_RETURNS_TWICE should achieve that,
it also makes it a code motion barrier.


Just so I'm clear, you're not saying that RETURNS_TWICE will stop the call being 
duplicated though?


thinking a little further, a code motion barrier is stronger than I need (but 
conservatively safe).  For instance:


UNIQUE (HEAD)
for (...)
{
  a = 
}
UNIQUE (TAIL)

It would be safe and desirable to move that loop invariant to before the UNIQUE. 
 Perhaps it won't matter in practice -- after all having N physical threads 
calculate it in parallel (just after the HEAD marker, but before the loop) will 
probably take no longer than a single thread doing it while the others wait.[*]


nathan

[*] ut it will take more power.


Re: [OpenACC 2/11] PTX backend changes

2015-10-22 Thread Bernd Schmidt

On 10/21/2015 09:09 PM, Nathan Sidwell wrote:

At the beginning of a partitioned region, we have to  propagate live
register state and stack frame from engine-zero to the other engines
(just as would happen on a regular 'fork' call).


This is something I'm not terribly happy about, but since I have no 
alternative to offer, it's fine. I expect it could turn out to be a 
glass jaw of the implementation though. Speaking of which - do you have 
any recent speedup numbers for nontrivial OpenACC programs?


Some other minor nitpicks:


+  /* Emit fork at all levels, this helps form SESE regions..  */


Could expand the comment, it doesn't help me understand the issue. Also, 
punctuation.



+/* Structure used when generating a worker-level spill or fill.  */
+
+struct wcast_data_t
+{
+  rtx base;
+  rtx ptr;
+  unsigned offset;
+};


Could document the members.


+/* Loop structure of the function.The entire function is described as


Whitespace.


+  // Clear visited flag, for use by parallel locator  */


Odd comment style :-)


+
+static void
+nvptx_propagate (basic_block block, rtx_insn *insn, propagate_mask rw,
+rtx (*fn) (rtx, propagate_mask,
+   unsigned, void *), void *data)


Break that out into a typedef?


+ /* Allow worker function to initialize anything needed */


Punctuation.


+   default:break;


Formatting.


+  if (par->mask & GOMP_DIM_MASK (GOMP_DIM_MAX))
+{ /* No propagation needed for a call.  */ }
+  else if (par->mask & GOMP_DIM_MASK (GOMP_DIM_WORKER))


Ok that looks weird with the open brace on the line before the else. I 
think the standard practice is to just use "/* .. */;", but possibly 
just invert the if condition and move the else branches into it.



+  unsigned me = par->mask
+& (GOMP_DIM_MASK (GOMP_DIM_WORKER) | GOMP_DIM_MASK (GOMP_DIM_VECTOR));


Formatting. Maybe have extra defines for the masks so you don't have to 
spell GOMP_DIM_MASK everytime?



+  if ((outer | me) & GOMP_DIM_MASK (mode))
+   { /* Mode is partitioned: no neutering.  */ }
+  else if (!(modes & GOMP_DIM_MASK (mode)))
+   { /* Mode  is not used: nothing to do.  */ }


Same issue as above. Whitespace in comment.


+  else
+   { /* Parent will skip this parallel itself.  */ }


Here too - actually no need to have an empty else at all.


+  "%.\\tmov.b64\\t{%0,%1}, %2;")

> +  "%.\\tmov.b64\\t%0, {%1,%2};")

Might want to add a space after the comma. I can see arguments for and 
against, so do what you like.



Bernd


Re: [OpenACC 11/11] execution tests

2015-10-22 Thread Jakub Jelinek
On Thu, Oct 22, 2015 at 09:53:46AM -0400, Nathan Sidwell wrote:
> On 10/22/15 05:37, Jakub Jelinek wrote:
> 
> >And, I must say I'm at least missing testcases that check parsing but also
> >runtime behavior of the vector or worker clause arguments (there
> >is one gang (static:1) clause, but not the other clauses nor other styles of
> >gang arguments.
> 
> the static clause is only valid on gang.

That is what I've figured out.
But it is unclear from the parsing what from these is allowed:
int v, w;
...
gang(26)
gang(v)
vector(length: 16)
vector(length: v)
vector(16)
vector(v)
worker(num: 16)
worker(num: v)
worker(16)
worker(v)
gang(16, 24)
gang(v, w)
gang(static: 16, num: 5)
gang(static: v, num: w)
gang(num: 5, static: 4)
gang(num: v, static: w)

and if the length: or num: part is really optional, then
int length, num;
vector(length)
worker(num)
gang(num, static: 6)
gang(static: 5, num)
should be also accepted (or subset thereof?).

Jakub


RFA/RFC: insns that do not start a source line

2015-10-22 Thread Nick Clifton
Hi Guys,

  Sometimes gcc can generate instructions out of order with respect to
  lines of source code, and this can lead to problems for debuggers.
  For example, consider this source code snippet:

  v = 0; /* Line 31 */
  goto b;/* Line 32 */
a:  v++; /* Line 33 */
b:  if (v < 3) goto a;   /* Line 34 */

  Compiled without optimization, but with -g, gcc will produce code like
  this:
  
400692: c7 05 94 09 20 00 00movl   $0x0,0x200994(%rip)
400699: 00 00 00 
40069c: eb 10   jmp4006ae 
40069e: 90  nop
40069f: 8b 05 8b 09 20 00   mov0x20098b(%rip),%eax
4006a5: 83 c0 01add$0x1,%eax
4006a8: 89 05 82 09 20 00   mov%eax,0x200982(%rip)
4006ae: 8b 05 7c 09 20 00   mov0x20097c(%rip),%eax
4006b4: 83 f8 02cmp$0x2,%eax
4006b7: 7e e5   jle40069e 
4006b9: 90  nop

  This example uses x86_64, but the same problem occurs for most
  architectures.  Note the two NOP instructions.  The DWARF line table
  information produced for this code looks like this:

advance Address by  5 to 0x400692 and Line by 7 to 31
advance Address by 10 to 0x40069c and Line by 1 to 32
advance Address by  2 to 0x40069e and Line by 2 to 34
advance Address by  1 to 0x40069f and Line by -1 to 33
advance Address by 15 to 0x4006ae and Line by 1 to 34
advance Address by 11 to 0x4006b9 and Line by 1 to 35

  (Note the backwards movement of the line number to the start of line
  33).  The problem is line 34.  According to the line table it starts
  at address 0x40069e whereas in reality it starts at 0x4006ae.  What
  has happened is that gcc has generated a NOP instruction to be the
  destination of the goto on line 34 - ie the label a: - but rather than
  associate it with line 33, it has associated it with line 34.  This
  means that line 34 now appears twice in the line number table.

  I am offering the patch below as a fix for this problem, although I am
  open to suggestions for better solutions.  The patch creates a hash
  table for instructions which gcc knows should not be considered to be
  the start of a source code line, then it checks this table when
  noticing source line changes.  The patch only adds the NOP
  instructions generated for the test code above, although I expect that
  in the future there may be other places that need to record this
  information.

  Tested with no regressions on an x86_64-pc-linux toolchain.

  Comments / questions / approval ?

Cheers
  Nick

gcc/ChangeLog
2015-10-22  Nick Clifton  

* cfgrtl.c (not_a_stmt_hasher): New struct.
(not_a_stmt_htab): New hash table.
(is_a_stmt): New function.  Returns true iff the given insn is on
the list of insns known not to start a line of source code.
(fixup_reorder_chain): Add generated NOP instructions to the list
of instructions that are known not to start a source line.
* cfgrtl.h (is_a_stmt): Add prototype.
* final.c (notice_source_line): Use the new function to set the
is_stmt return value.

Index: gcc/cfgrtl.c
===
--- gcc/cfgrtl.c(revision 229162)
+++ gcc/cfgrtl.c(working copy)
@@ -3665,7 +3665,21 @@
   compact_blocks ();
 }
 
+struct not_a_stmt_hasher : ggc_cache_ptr_hash
+{
+  static hashval_t hash (rtx x) { return htab_hash_pointer (x); }
+  static bool equal (rtx a, rtx b) { return a == b; }
+};
 
+static GTY((cache)) hash_table *not_a_stmt_htab;
+
+/* Return TRUE if INSN is on the list of not-an-insn.  */
+bool
+is_a_stmt (rtx_insn *insn)
+{
+  return not_a_stmt_htab == NULL || ! not_a_stmt_htab->find (insn);
+}
+
 /* Given a reorder chain, rearrange the code to match.  */
 
 static void
@@ -3944,8 +3958,18 @@
}
  nb = split_edge (e);
  if (!INSN_P (BB_END (nb)))
-   BB_END (nb) = emit_insn_after_noloc (gen_nop (), BB_END (nb),
-nb);
+   {
+ BB_END (nb) = emit_insn_after_noloc (gen_nop (), BB_END (nb),
+  nb);
+ 
+ /* Note that this insn is a fake and should not be considered
+to be the starting insn of a line of source code.  */
+ if (not_a_stmt_htab == NULL)
+   not_a_stmt_htab = hash_table::create_ggc 
(17);
+ rtx_def ** slot;
+ slot = not_a_stmt_htab->find_slot (BB_END (nb), INSERT);
+ * slot = BB_END (nb);
+   }
  INSN_LOCATION (BB_END (nb)) = e->goto_locus;
 
  /* If there are other incoming edges to the destination block
Index: gcc/cfgrt

Re: Constify host-side offload data`

2015-10-22 Thread Ilya Verbin
On Wed, Oct 21, 2015 at 10:44:56 -0700, H.J. Lu wrote:
> On Wed, Oct 21, 2015 at 10:42 AM, Ilya Verbin  wrote:
> > On Wed, Oct 21, 2015 at 10:38:10 -0700, H.J. Lu wrote:
> >> On Wed, Oct 21, 2015 at 10:33 AM, Ilya Verbin  wrote:
> >> > H.J.,
> >> > Maybe linker should print some warning about joining writable + 
> >> > nonwritable
> >> > sections?  Here is a simple testcase:
> >> >
> >> > $ cat t1.s
> >> > .section ".AAA", "a"
> >> > .long 0x12345678
> >> > $ cat t2.s
> >> > .section ".AAA", "wa"
> >> > .long 0x12345678
> >> > $ as t1.s -o t1.o
> >> > $ as t2.s -o t2.o
> >> > $ ld -shared t1.o t2.o
> >> > $ ls -lh a.out
> >> > 2.1M a.out
> >> >
> >>
> >> Does linker make AAA  writable? If yes, linker does what it
> >> is told.
> >
> > Yes, it makes it writable, but why it also makes this?
> >
> >   [Nr] Name  Type Address   Offset
> >Size  EntSize  Flags  Link  Info  Align
> >   [ 0]   NULL   
> >     0 0 0
> >   [ 1] .hash HASH 00b0  00b0
> >0028  0004   A   2 0 8
> >   [ 2] .dynsym   DYNSYM   00d8  00d8
> >0078  0018   A   3 2 8
> >   [ 3] .dynstr   STRTAB   0150  0150
> >0019     A   0 0 1
> >   [ 4] .AAA  PROGBITS 0169  0169
> >0008    WA   0 0 1
> >   [ 5] .eh_frame PROGBITS 0178  0178
> >     A   0 0 8
> >   [ 6] .dynamic  DYNAMIC  00200178  00200178  <-- 
> > ???
> >00b0  0010  WA   3 0 8
> >   [ 7] .shstrtab STRTAB     00200380
> >0049     0 0 1
> >   [ 8] .symtab   SYMTAB     00200228
> >0120  0018   9 9 8
> >   [ 9] .strtab   STRTAB     00200348
> >0038     0 0 1
> >
> 
> Linker groups input sections by section name and ors section
> flags.

Could you please help figure out how this number 0x200178 is calculated?
ld -verbose doesn't show anything helpful.  It seems that something goes wrong
during section-to-segment mapping, because when both .AAA have "wa" flags, we
got small binary with 2 LOAD segments:
  Type   Offset VirtAddr   PhysAddr
 FileSizMemSiz  Flags  Align
  LOAD   0x 0x 0x
 0x01a8 0x01a8  R  20
  LOAD   0x01a8 0x002001a8 0x002001a8
 0x00b8 0x00b8  RW 20

But when one .AAA has "a" flag, and another .AAA has "wa" flag, we got huge
binary with only one big LOAD segment:
  Type   Offset VirtAddr   PhysAddr
 FileSizMemSiz  Flags  Align
  LOAD   0x 0x 0x
 0x00200228 0x00200228  RW 20

BTW, gold produces small binary in both cases.

Thanks,
  -- Ilya


Re: RFA/RFC: insns that do not start a source line

2015-10-22 Thread Bernd Schmidt

Hi Nick,

On 10/22/2015 04:07 PM, Nick Clifton wrote:

   Sometimes gcc can generate instructions out of order with respect to
   lines of source code, and this can lead to problems for debuggers.
   For example, consider this source code snippet:

   v = 0; /* Line 31 */
   goto b;   /* Line 32 */
 a:  v++; /* Line 33 */
 b:  if (v < 3) goto a;   /* Line 34 */




 What
   has happened is that gcc has generated a NOP instruction to be the
   destination of the goto on line 34 - ie the label a: - but rather than
   associate it with line 33, it has associated it with line 34.  This
   means that line 34 now appears twice in the line number table.


Could you point me at the code generating these NOPs (and maybe a 
testcase)? Also, is this the only case you know of or are there others?



Bernd


Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)

2015-10-22 Thread Alan Lawrence

Just one very small point...

On 19/10/15 09:17, Alan Hayward wrote:

> -  if (check_reduction
> -  && (!commutative_tree_code (code) || !associative_tree_code (code)))
> +  if (check_reduction)
>  {
> -  if (dump_enabled_p ())
> -report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
> -  "reduction: not commutative/associative: ");
> -  return NULL;
> +  if (code != COND_EXPR
> +&& (!commutative_tree_code (code) || !associative_tree_code (code)))
> +  {
> +if (dump_enabled_p ())
> +  report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
> +  "reduction: not commutative/associative: ");
> +return NULL;
> +  }
> +
> +  if (code == COND_EXPR)
> +  *v_reduc_type = COND_REDUCTION;

Wouldn't this be easier written as

if (code == COND_EXPR)
  *v_reduc_type = COND_REDUCTION;
else if (!commutative_tree_code (code) || !associative_tree_code (code))
  {...}

? Your call!

Cheers, Alan



Re: [PATCH v2 13/13] Add hook for modifying debug info for address spaces

2015-10-22 Thread Ulrich Weigand
Richard Henderson wrote:

> So you would recommend continuing to use address_class for these x86 segments?

Yes, I think so.  As far as I can see, this would simply require to define two
address_class values to correspond to __seg_fs and __seg_gs.  (These choices
should best be documented in the platform ABI.)

GDB infrastructure to decode and use address_class should already be in place,
you simply need to implement the relevant gdbarch callbacks:

  address_class_type_flags
Convert from DWARF address class to GDB internal encoding

  address_class_type_flags_to_name
  address_class_name_to_type_flags
Convert between GDB internal encoding and printable name (e.g. "__seg_fs")

  address_to_pointer
  pointer_to_address
Convert between a GDB flat address (CORE_ADDR) and a target-specific
pointer encoding, based on the pointer's address class flags

Of course, to do the latter, you still need access to the segment base values,
as discussed on the GDB mailing list.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: RFA/RFC: insns that do not start a source line

2015-10-22 Thread Nick Clifton

Hi Bernd,


Could you point me at the code generating these NOPs


It is in cfgrtl.c:fixup_reorder_chain()


  nb = split_edge (e);
  if (!INSN_P (BB_END (nb)))
BB_END (nb) = emit_insn_after_noloc (gen_nop (), BB_END (nb),
 nb);
  INSN_LOCATION (BB_END (nb)) = e->goto_locus;


 and maybe a testcase)?


Full testcase attached.  Compile with -O0 -g.


 Also, is this the only case you know of or are there others?


Sorry, I have not investigated this.  The problem was reported to me by 
a GDB engineer, who was investigating gdb's failure to apply breakpoints 
to the correct locations under some circumstances (such as the code in 
the testcase).  There may well be other tests that have similar 
properties, especially loops that are unrolled or goto statements that 
cross basic blocks.


Cheers
  Nick


volatile int v;
volatile int w;

void
loop_test (void)
{
  v = 0;

  while (v < 3) /* Loop 1 condition */
{
  v++;  /* Loop 1 increment */
}

  v = -42;

  for (v = 0; v < 3; )  /* Loop 2 */
{
  v++;  /* Loop 2 increment */
}

  v = -42;
  w = 42;

  for (v = 0;   /* Loop 3 initialization */
   v < 3;   /* Loop 3 condition */
   v++) /* Loop 3 increment */
 {
  w = w - v;
 }

  v = 0;
  goto b;   /* Loop 4 initial goto */
a:  v++;
b:  if (v < 3) goto a;  /* Loop 4 condition */
}

int main (int argc, char **argv)
{
  loop_test ();

  return 0;
}


Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)

2015-10-22 Thread Alan Hayward


On 22/10/2015 15:15, "Alan Lawrence"  wrote:

>Just one very small point...
>
>On 19/10/15 09:17, Alan Hayward wrote:
>
> > -  if (check_reduction
> > -  && (!commutative_tree_code (code) || !associative_tree_code
>(code)))
> > +  if (check_reduction)
> >  {
> > -  if (dump_enabled_p ())
> > -report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
> > -   "reduction: not commutative/associative: ");
> > -  return NULL;
> > +  if (code != COND_EXPR
> > + && (!commutative_tree_code (code) || !associative_tree_code
>(code)))
> > +   {
> > + if (dump_enabled_p ())
> > +   report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
> > +   "reduction: not commutative/associative: ");
> > + return NULL;
> > +   }
> > +
> > +  if (code == COND_EXPR)
> > +   *v_reduc_type = COND_REDUCTION;
>
>Wouldn't this be easier written as
>
>if (code == COND_EXPR)
>   *v_reduc_type = COND_REDUCTION;
>else if (!commutative_tree_code (code) || !associative_tree_code (code))
>   {...}
>
>? Your call!
>
>Cheers, Alan


Good spot! I suspect that’s slipped through when I’ve rewritten bits.
I’ll add this in with Richard’s suggestion of the change around the call
to vectorizable_condition.


Alan.




Re: [OpenACC 11/11] execution tests

2015-10-22 Thread Nathan Sidwell

On 10/22/15 10:05, Jakub Jelinek wrote:

On Thu, Oct 22, 2015 at 09:53:46AM -0400, Nathan Sidwell wrote:

On 10/22/15 05:37, Jakub Jelinek wrote:


And, I must say I'm at least missing testcases that check parsing but also
runtime behavior of the vector or worker clause arguments (there
is one gang (static:1) clause, but not the other clauses nor other styles of
gang arguments.


the static clause is only valid on gang.


That is what I've figured out.
But it is unclear from the parsing what from these is allowed:


good questions.  As you may have guessed, I'm not the primary author of the 
parsing code.  Cesar's stepped up to address this.


nathan


Re: [0/7] Type promotion pass and elimination of zext/sext

2015-10-22 Thread Richard Biener
On Thu, Oct 22, 2015 at 12:50 PM, Kugan
 wrote:
>
>
> On 21/10/15 23:45, Richard Biener wrote:
>> On Tue, Oct 20, 2015 at 10:03 PM, Kugan
>>  wrote:
>>>
>>>
>>> On 07/09/15 12:53, Kugan wrote:

 This a new version of the patch posted in
 https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00226.html. I have done
 more testing and spitted the patch to make it more easier to review.
 There are still couple of issues to be addressed and I am working on them.

 1. AARCH64 bootstrap now fails with the commit
 94f92c36a83d66a893c3bc6f00a038ba3dbe2a6f. simplify-rtx.c is mis-compiled
 in stage2 and fwprop.c is failing. It looks to me that there is a latent
 issue which gets exposed my patch. I can also reproduce this in x86_64
 if I use the same PROMOTE_MODE which is used in aarch64 port. For the
 time being, I am using  patch
 0006-temporary-workaround-for-bootstrap-failure-due-to-co.patch as a
 workaround. This meeds to be fixed before the patches are ready to be
 committed.

 2. vector-compare-1.c from c-c++-common/torture fails to assemble with
 -O3 -g Error: unaligned opcodes detected in executable segment. It works
 fine if I remove the -g. I am looking into it and needs to be fixed as 
 well.
>>>
>>> Hi Richard,
>>>
>>> Now that stage 1 is going to close, I would like to get these patches
>>> accepted for stage1. I will try my best to address your review comments
>>> ASAP.
>>
>> Ok, can you make the whole patch series available so I can poke at the
>> implementation a bit?  Please state the revision it was rebased on
>> (or point me to a git/svn branch the work resides on).
>>
>
> Thanks. Please find the patched rebated against trunk@229156. I have
> skipped the test-case readjustment patches.

Some quick observations.  On x86_64 when building

short bar (short y);
int foo (short x)
{
  short y = bar (x) + 15;
  return y;
}

with -m32 -O2 -mtune=pentiumpro (which ends up promoting HImode regs)
I get

  :
  _1 = (int) x_10(D);
  _2 = (_1) sext (16);
  _11 = bar (_2);
  _5 = (int) _11;
  _12 = (unsigned int) _5;
  _6 = _12 & 65535;
  _7 = _6 + 15;
  _13 = (int) _7;
  _8 = (_13) sext (16);
  _9 = (_8) sext (16);
  return _9;

which looks fine but the VRP optimization doesn't trigger for the redundant sext
(ranges are computed correctly but the 2nd extension is not removed).

This also makes me notice trivial match.pd patterns are missing, like
for example

(simplify
 (sext (sext@2 @0 @1) @3)
 (if (tree_int_cst_compare (@1, @3) <= 0)
  @2
  (sext @0 @3)))

as VRP doesn't run at -O1 we must rely on those to remove rendudant extensions,
otherwise generated code might get worse compared to without the pass(?)

I also notice that the 'short' argument does not get it's sign-extension removed
as redundand either even though we have

_1 = (int) x_8(D);
Found new range for _1: [-32768, 32767]

In the end I suspect that keeping track of the "simple" cases in the promotion
pass itself (by keeping a lattice) might be a good idea (after we fix VRP to do
its work).  In some way whether the ABI guarantees promoted argument
registers might need some other target hook queries.

Now onto the 0002 patch.

+static bool
+type_precision_ok (tree type)
+{
+  return (TYPE_PRECISION (type)  == 8
+ || TYPE_PRECISION (type) == 16
+ || TYPE_PRECISION (type) == 32);
+}

that's a weird function to me.  You probably want
TYPE_PRECISION (type) == GET_MODE_PRECISION (TYPE_MODE (type))
here?  And guard that thing with POINTER_TYPE_P || INTEGRAL_TYPE_P?

+/* Return the promoted type for TYPE.  */
+static tree
+get_promoted_type (tree type)
+{
+  tree promoted_type;
+  enum machine_mode mode;
+  int uns;
+  if (POINTER_TYPE_P (type)
+  || !INTEGRAL_TYPE_P (type)
+  || !type_precision_ok (type))
+return type;
+
+  mode = TYPE_MODE (type);
+#ifdef PROMOTE_MODE
+  uns = TYPE_SIGN (type);
+  PROMOTE_MODE (mode, uns, type);
+#endif
+  uns = TYPE_SIGN (type);
+  promoted_type = lang_hooks.types.type_for_mode (mode, uns);
+  if (promoted_type
+  && (TYPE_PRECISION (promoted_type) > TYPE_PRECISION (type)))
+type = promoted_type;

I think what you want to verify is that TYPE_PRECISION (promoted_type)
== GET_MODE_PRECISION (mode).
And to not even bother with this simply use

promoted_type = build_nonstandard_integer_type (GET_MODE_PRECISION (mode), uns);

You use a domwalk but also might create new basic-blocks during it
(insert_on_edge_immediate), that's a
no-no, commit edge inserts after the domwalk.
ssa_sets_higher_bits_bitmap looks unused and
we generally don't free dominance info, so please don't do that.

I fired off a bootstrap on ppc64-linux which fails building stage1 libgcc with

/abuild/rguenther/obj/./gcc/xgcc -B/abuild/rguenther/obj/./gcc/
-B/usr/local/powerpc64-unknown-linux-gnu/bin/
-B/usr/local/powerpc64-unknown-linux-gnu/lib/ -isystem
/usr/local/powerpc64-unknown-linux-gnu/include -isystem
/usr/local/powerpc64-unknown-linux-gnu/sy

Re: [OpenACC 2/11] PTX backend changes

2015-10-22 Thread Nathan Sidwell

On 10/22/15 10:04, Bernd Schmidt wrote:


+  if (par->mask & GOMP_DIM_MASK (GOMP_DIM_MAX))
+{ /* No propagation needed for a call.  */ }
+  else if (par->mask & GOMP_DIM_MASK (GOMP_DIM_WORKER))


Ok that looks weird with the open brace on the line before the else. I think the
standard practice is to just use "/* .. */;", but possibly just invert the if
condition and move the else branches into it.


I find it more obviously an empty if -- that ';' can get lost with the comment 
(I have vague memories of a compiler warning too, I'll give it a try.  Inverting 
the condition makes the sequence confusing, IMHO.




+  else
+{ /* Parent will skip this parallel itself.  */ }


Here too - actually no need to have an empty else at all.


I wanted somewhere clear for the comment to go. (Actually, I think this is the 
one the compiler warns about -- empty dangling else).



+  "%.\\tmov.b64\\t{%0,%1}, %2;")

 > +  "%.\\tmov.b64\\t%0, {%1,%2};")

Might want to add a space after the comma. I can see arguments for and against,
so do what you like.


Yeah, same thoughts ran through my head ...

nathan


Re: [OpenACC 1/11] UNIQUE internal function

2015-10-22 Thread Richard Biener
On Thu, Oct 22, 2015 at 4:01 PM, Nathan Sidwell  wrote:
> On 10/22/15 04:07, Richard Biener wrote:
>
>> Yeah, please make them either end or start a BB so we have to check
>> at most a single stmt.  ECF_RETURNS_TWICE should achieve that,
>> it also makes it a code motion barrier.
>
>
> Just so I'm clear, you're not saying that RETURNS_TWICE will stop the call
> being duplicated though?

It will in practice.  RETURNS_TWICE will get you an abnormal edge from
entry (I think)

> thinking a little further, a code motion barrier is stronger than I need
> (but conservatively safe).  For instance:
>
> UNIQUE (HEAD)
> for (...)
> {
>   a = 
> }
> UNIQUE (TAIL)
>
> It would be safe and desirable to move that loop invariant to before the
> UNIQUE.  Perhaps it won't matter in practice -- after all having N physical
> threads calculate it in parallel (just after the HEAD marker, but before the
> loop) will probably take no longer than a single thread doing it while the
> others wait.[*]

RETURNS_TWICE will make the invariant motion stop at UNIQUE (HEAD),
but it would have done that anyway.  It will also be a CSE barrier, thus

tem = global;
UNIQUE(HEAD)
tem2 = global;

will not CSE tem2 to tem.

Richard.

> nathan
>
> [*] ut it will take more power.


Re: [OpenACC 2/11] PTX backend changes

2015-10-22 Thread Bernd Schmidt

On 10/22/2015 04:24 PM, Nathan Sidwell wrote:

+  else
+{ /* Parent will skip this parallel itself.  */ }


Here too - actually no need to have an empty else at all.


I wanted somewhere clear for the comment to go. (Actually, I think this
is the one the compiler warns about -- empty dangling else).


Don't recall such a warning, but that doesn't mean it isn't there. About 
the comment, it could either go just after the if, or above it, along 
the lines of

 /* We handle cases X, Y, Z here.  If none of them apply, then
the parent will skip this parallel itself.  */

In this particular loop you could also use continue; for the empty if 
conditions.



Bernd



Re: [OpenACC 1/11] UNIQUE internal function

2015-10-22 Thread Richard Biener
On Thu, Oct 22, 2015 at 3:24 PM, Nathan Sidwell  wrote:
> On 10/22/15 09:17, Jakub Jelinek wrote:
>>
>> On Thu, Oct 22, 2015 at 09:08:30AM -0400, Nathan Sidwell wrote:
>
>
>> I agree with Richard that it would be better to write more about what kind
>> of IL changes are acceptable with IFN_UNIQUE in the IL and what are not.
>> E.g. is inlining ok (I'd hope yes)?  Is function splitting ok (bet as long
>> as all IFN_UNIQUE calls stay in one or the other part, but not both)?
>
>
> Essentially, yes.  a set of IFN_UNIQUE form a group  which must not be
> separated  from each other.  The set is discovered implicitly by following
> the CFG (though I suppose we could add an identifying INT_CST operand or
> something equivalent).

I don't see how this is achieved though.  To achieve this you'd need data
dependences between them, sth like

token_1 = IFN_UNIQUE (HEAD);
...
token_2 = IFN_UNIQUE (TAIL, token_1);

not sure if that is enough (what is "separate from each other"?), for example
partial inlining might simply pass token_1 to the split part where only
IFN_UNIQUE (TAIL, token_1) would be in.  At least the above provides
ordering between the two IFN calls (which you achieve by having VDEFs
I guess, but then they are also barriers for memory optimizations).

Richard.

> nathan


Re: [OpenACC 1/11] UNIQUE internal function

2015-10-22 Thread Nathan Sidwell

On 10/22/15 10:26, Richard Biener wrote:

On Thu, Oct 22, 2015 at 4:01 PM, Nathan Sidwell  wrote:



RETURNS_TWICE will make the invariant motion stop at UNIQUE (HEAD),
but it would have done that anyway.  It will also be a CSE barrier, thus

tem = global;
UNIQUE(HEAD)
tem2 = global;

will not CSE tem2 to tem.


Yes, I can see it would behave like that for something globally visible.  What 
about state that isn't so visible?  (perhaps I'm worrying about something that 
doesn't matter, but I'd like to understand)


nathan


[Patch, fortran] PR58754 - [4.9/5 Regression] ICE on allocating character array with source

2015-10-22 Thread Paul Richard Thomas
Dear All,

This patch speaks for itself. It is by no means as comprehensive as
the work on ALLOCATE that Andre has done on 6 branch but has the
advantage for 5 branch that it is simple and steers the failing code
to the standard assignment, which should be safe.

Bootstraps and regtests on FC21/x86_64.

OK for 4.9 and 5 branches?

Cheers

Paul

2015-10-22  Paul Thomas  

PR fortran/58754
* trans-stmt.c (gfc_trans_allocate): Do not use the scalar
character assignment if the allocate expression sis an array
descriptor.

2015-10-22  Paul Thomas  

PR fortran/58754
* gfortran.dg/pr58754.f90: New test
Index: gcc/fortran/trans-stmt.c
===
*** gcc/fortran/trans-stmt.c(revision 229096)
--- gcc/fortran/trans-stmt.c(working copy)
*** gfc_trans_allocate (gfc_code * code)
*** 5629,5635 
  tmp = gfc_copy_class_to_class (expr3, to,
 nelems, upoly_expr);
}
! else if (code->expr3->ts.type == BT_CHARACTER)
{
  tmp = INDIRECT_REF_P (se.expr) ?
se.expr :
--- 5629,5636 
  tmp = gfc_copy_class_to_class (expr3, to,
 nelems, upoly_expr);
}
! else if (code->expr3->ts.type == BT_CHARACTER
!  && !GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (se.expr)))
{
  tmp = INDIRECT_REF_P (se.expr) ?
se.expr :
Index: gcc/testsuite/gfortran.dg/pr58754.f90
===
*** gcc/testsuite/gfortran.dg/pr58754.f90   (revision 0)
--- gcc/testsuite/gfortran.dg/pr58754.f90   (working copy)
***
*** 0 
--- 1,24 
+ ! { dg-do compile }
+ !
+ ! Tests the fix for PR58754
+ !
+   type :: char_type
+ character, allocatable :: chr (:)
+   end type
+   character, allocatable :: c(:)
+   type(char_type) :: d
+   character :: t(1) = ["w"]
+ 
+   allocate (c (1), source = t)
+   if (any (c .ne. t)) call abort
+   c = ["a"]
+   if (any (c .ne. ["a"])) call abort
+   deallocate (c)
+ 
+ ! Check allocatable character components, whilst we are about it.
+   allocate (d%chr (2), source = [t, char (ichar (t) + 1)])
+   if (any (d%chr .ne. ["w", "x"])) call abort
+   d%chr = ["a","b","c","d"]
+   if (any (d%chr .ne. ["a","b","c","d"])) call abort
+   deallocate (d%chr)
+ end


Re: [OpenACC 2/11] PTX backend changes

2015-10-22 Thread Jakub Jelinek
On Thu, Oct 22, 2015 at 04:28:17PM +0200, Bernd Schmidt wrote:
> On 10/22/2015 04:24 PM, Nathan Sidwell wrote:
> >>>+  else
> >>>+{ /* Parent will skip this parallel itself.  */ }
> >>
> >>Here too - actually no need to have an empty else at all.
> >
> >I wanted somewhere clear for the comment to go. (Actually, I think this
> >is the one the compiler warns about -- empty dangling else).
> 
> Don't recall such a warning, but that doesn't mean it isn't there. About the
> comment, it could either go just after the if, or above it, along the lines
> of

There is a warning for
  if (cond);
but not for
  if (cond)
;
or
  if (cond)
/* comment */ ;
which is the style used in various places throughout the compiler.

>  /* We handle cases X, Y, Z here.  If none of them apply, then
> the parent will skip this parallel itself.  */
> 
> In this particular loop you could also use continue; for the empty if
> conditions.

Jakub


Re: Constify host-side offload data`

2015-10-22 Thread H.J. Lu
On Thu, Oct 22, 2015 at 7:11 AM, Ilya Verbin  wrote:
> On Wed, Oct 21, 2015 at 10:44:56 -0700, H.J. Lu wrote:
>> On Wed, Oct 21, 2015 at 10:42 AM, Ilya Verbin  wrote:
>> > On Wed, Oct 21, 2015 at 10:38:10 -0700, H.J. Lu wrote:
>> >> On Wed, Oct 21, 2015 at 10:33 AM, Ilya Verbin  wrote:
>> >> > H.J.,
>> >> > Maybe linker should print some warning about joining writable + 
>> >> > nonwritable
>> >> > sections?  Here is a simple testcase:
>> >> >
>> >> > $ cat t1.s
>> >> > .section ".AAA", "a"
>> >> > .long 0x12345678
>> >> > $ cat t2.s
>> >> > .section ".AAA", "wa"
>> >> > .long 0x12345678
>> >> > $ as t1.s -o t1.o
>> >> > $ as t2.s -o t2.o
>> >> > $ ld -shared t1.o t2.o
>> >> > $ ls -lh a.out
>> >> > 2.1M a.out
>> >> >
>> >>
>> >> Does linker make AAA  writable? If yes, linker does what it
>> >> is told.
>> >
>> > Yes, it makes it writable, but why it also makes this?
>> >
>> >   [Nr] Name  Type Address   Offset
>> >Size  EntSize  Flags  Link  Info  Align
>> >   [ 0]   NULL   
>> >     0 0 0
>> >   [ 1] .hash HASH 00b0  00b0
>> >0028  0004   A   2 0 8
>> >   [ 2] .dynsym   DYNSYM   00d8  00d8
>> >0078  0018   A   3 2 8
>> >   [ 3] .dynstr   STRTAB   0150  0150
>> >0019     A   0 0 1
>> >   [ 4] .AAA  PROGBITS 0169  0169
>> >0008    WA   0 0 1
>> >   [ 5] .eh_frame PROGBITS 0178  0178
>> >     A   0 0 8
>> >   [ 6] .dynamic  DYNAMIC  00200178  00200178  <-- 
>> > ???
>> >00b0  0010  WA   3 0 8
>> >   [ 7] .shstrtab STRTAB     00200380
>> >0049     0 0 1
>> >   [ 8] .symtab   SYMTAB     00200228
>> >0120  0018   9 9 8
>> >   [ 9] .strtab   STRTAB     00200348
>> >0038     0 0 1
>> >
>>
>> Linker groups input sections by section name and ors section
>> flags.
>
> Could you please help figure out how this number 0x200178 is calculated?
> ld -verbose doesn't show anything helpful.  It seems that something goes wrong
> during section-to-segment mapping, because when both .AAA have "wa" flags, we
> got small binary with 2 LOAD segments:
>   Type   Offset VirtAddr   PhysAddr
>  FileSizMemSiz  Flags  Align
>   LOAD   0x 0x 0x
>  0x01a8 0x01a8  R  20
>   LOAD   0x01a8 0x002001a8 0x002001a8
>  0x00b8 0x00b8  RW 20
>
> But when one .AAA has "a" flag, and another .AAA has "wa" flag, we got huge
> binary with only one big LOAD segment:
>   Type   Offset VirtAddr   PhysAddr
>  FileSizMemSiz  Flags  Align
>   LOAD   0x 0x 0x
>  0x00200228 0x00200228  RW 20
>
> BTW, gold produces small binary in both cases.
>

Please open a binutils bug with a testcase.


-- 
H.J.


Re: [OpenACC 2/11] PTX backend changes

2015-10-22 Thread Nathan Sidwell

On 10/22/15 10:28, Bernd Schmidt wrote:

On 10/22/2015 04:24 PM, Nathan Sidwell wrote:

+  else
+{ /* Parent will skip this parallel itself.  */ }


Here too - actually no need to have an empty else at all.


I wanted somewhere clear for the comment to go. (Actually, I think this
is the one the compiler warns about -- empty dangling else).


Don't recall such a warning, but that doesn't mean it isn't there.


found it:
.../nvptx.c:3007:47: warning: suggest braces around empty body in an 'else' 
statement [-Wempty-body]


I did what the compiler told me!  Always obey the compiler![*]


About the
comment, it could either go just after the if, or above it, along the lines of
  /* We handle cases X, Y, Z here.  If none of them apply, then
 the parent will skip this parallel itself.  */


Again I find that confusing -- much clearer to annotate the if cascade itself. 
(IIRC, I originally had a comment at the top, but it was demonstrably too 
confusing for me as the code was wrong)



In this particular loop you could also use continue; for the empty if 
conditions.


I think that's going to be confusing too.

nathan

[*] and hypnotoad.


Re: [OpenACC 1/11] UNIQUE internal function

2015-10-22 Thread Nathan Sidwell

On 10/22/15 10:30, Richard Biener wrote:

On Thu, Oct 22, 2015 at 3:24 PM, Nathan Sidwell  wrote:


Essentially, yes.  a set of IFN_UNIQUE form a group  which must not be
separated  from each other.  The set is discovered implicitly by following
the CFG (though I suppose we could add an identifying INT_CST operand or
something equivalent).


I don't see how this is achieved though.


Well, in practice it does.


 To achieve this you'd need data
dependences between them, sth like

token_1 = IFN_UNIQUE (HEAD);
...
token_2 = IFN_UNIQUE (TAIL, token_1);

not sure if that is enough (what is "separate from each other"?), for example
partial inlining might simply pass token_1 to the split part where only
IFN_UNIQUE (TAIL, token_1) would be in.


Yeah, such partial inlining will break.  Not encountered it happening though.


At least the above provides
ordering between the two IFN calls (which you achieve by having VDEFs
I guess, but then they are also barriers for memory optimizations).


Right, I think I'm relying on the compiler's lack of knowledge about what global 
state might be affected by the two calls to prevent it reordering them WRT 
eachother.  Is that what you meant?


(I did wonder about the need to add the kind of data dependency you describe, 
but found it unnecessary.)


nathan


Re: [OpenACC 11/11] execution tests

2015-10-22 Thread Cesar Philippidis
On 10/22/2015 07:23 AM, Nathan Sidwell wrote:
> On 10/22/15 10:05, Jakub Jelinek wrote:
>> On Thu, Oct 22, 2015 at 09:53:46AM -0400, Nathan Sidwell wrote:
>>> On 10/22/15 05:37, Jakub Jelinek wrote:
>>>
 And, I must say I'm at least missing testcases that check parsing
 but also
 runtime behavior of the vector or worker clause arguments (there
 is one gang (static:1) clause, but not the other clauses nor other
 styles of
 gang arguments.
>>>
>>> the static clause is only valid on gang.
>>
>> That is what I've figured out.
>> But it is unclear from the parsing what from these is allowed:
> 
> good questions.  As you may have guessed, I'm not the primary author of
> the parsing code.  Cesar's stepped up to address this.

I'll go into more detail later when I post the revised patch, but for
the time being, in response to your to your earlier question I've
inlined how the clauses should be translated in comments below:

> But it is unclear from the parsing what from these is allowed:

int v, w;
...
gang(26)  // equivalent to gang(num:26)
gang(v)   // gang(num:v)
vector(length: 16)  // vector(length: 16)
vector(length: v)  // vector(length: v)
vector(16)  // vector(length: 16)
vector(v)   // vector(length: v)
worker(num: 16)  // worker(num: 16)
worker(num: v)   // worker(num: 16)
worker(16)  // worker(num: 16)
worker(v)   // worker(num: 16)
gang(16, 24)  // technically gang(num:16, num:24) is acceptable but it
  // should be an error
gang(v, w)  // likewise
gang(static: 16, num: 5)  // gang(static: 16, num: 5)
gang(static: v, num: w)   // gang(static: v, num: w)
gang(num: 5, static: 4)   // gang(num: 5, static: 4)
gang(num: v, static: w)   // gang(num: v, static: w)

Also note that the static argument can accept '*'.

> and if the length: or num: part is really optional, then
> int length, num;
> vector(length)
> worker(num)
> gang(num, static: 6)
> gang(static: 5, num)
> should be also accepted (or subset thereof?).

Interesting question. The spec is unclear. It defines gang, worker and
vector as follows in section 2.7 in the OpenACC 2.0a spec:

  gang [( gang-arg-list )]
  worker [( [num:] int-expr )]
  vector [( [length:] int-expr )]

where gang-arg is one of:

  [num:] int-expr
  static: size-expr

and gang-arg-list may have at most one num and one static argument,
and where size-expr is one of:

  *
  int-expr

So I've interpreted that as a requirement that length and num must be
followed by an int-expr, whatever that is.

I've been meaning to cleanup to up the c and c++ front ends for a while
now, but I've been bogged down by other things. This is next on my todo
list.

Cesar


Re: [OpenACC 2/11] PTX backend changes

2015-10-22 Thread Nathan Sidwell

On 10/22/15 10:32, Jakub Jelinek wrote:

There is a warning for
   if (cond);
but not for
   if (cond)
 ;
or
   if (cond)
 /* comment */ ;
which is the style used in various places throughout the compiler.


Sadly, that's not quite accurate.  The warning occurs for all the empty if's you 
describe.  It doesn't occur for


  if (a)
;
  else ...

This is a case where I really want to say something about that else.

nathan


Re: [OpenACC 11/11] execution tests

2015-10-22 Thread Nathan Sidwell

On 10/22/15 10:47, Cesar Philippidis wrote:


Interesting question. The spec is unclear. It defines gang, worker and
vector as follows in section 2.7 in the OpenACC 2.0a spec:

   gang [( gang-arg-list )]
   worker [( [num:] int-expr )]
   vector [( [length:] int-expr )]

where gang-arg is one of:

   [num:] int-expr
   static: size-expr



the spec is intentionally unspecific about whether the exprs are 
integer-constant-expressions or integer-expressions.  Leaving it as an 
implementation choice.


nathan



Re: [OpenACC 11/11] execution tests

2015-10-22 Thread Jakub Jelinek
On Thu, Oct 22, 2015 at 07:47:01AM -0700, Cesar Philippidis wrote:
> > But it is unclear from the parsing what from these is allowed:
> 
> int v, w;
> ...
> gang(26)  // equivalent to gang(num:26)
> gang(v)   // gang(num:v)
> vector(length: 16)  // vector(length: 16)
> vector(length: v)  // vector(length: v)
> vector(16)  // vector(length: 16)
> vector(v)   // vector(length: v)
> worker(num: 16)  // worker(num: 16)
> worker(num: v)   // worker(num: 16)
> worker(16)  // worker(num: 16)
> worker(v)   // worker(num: 16)
> gang(16, 24)  // technically gang(num:16, num:24) is acceptable but it
>   // should be an error
> gang(v, w)  // likewise
> gang(static: 16, num: 5)  // gang(static: 16, num: 5)
> gang(static: v, num: w)   // gang(static: v, num: w)
> gang(num: 5, static: 4)   // gang(num: 5, static: 4)
> gang(num: v, static: w)   // gang(num: v, static: w)
> 
> Also note that the static argument can accept '*'.
> 
> > and if the length: or num: part is really optional, then
> > int length, num;
> > vector(length)
> > worker(num)
> > gang(num, static: 6)
> > gang(static: 5, num)
> > should be also accepted (or subset thereof?).
> 
> Interesting question. The spec is unclear. It defines gang, worker and
> vector as follows in section 2.7 in the OpenACC 2.0a spec:
> 
>   gang [( gang-arg-list )]
>   worker [( [num:] int-expr )]
>   vector [( [length:] int-expr )]
> 
> where gang-arg is one of:
> 
>   [num:] int-expr
>   static: size-expr
> 
> and gang-arg-list may have at most one num and one static argument,
> and where size-expr is one of:
> 
>   *
>   int-expr
> 
> So I've interpreted that as a requirement that length and num must be
> followed by an int-expr, whatever that is.

My reading of the above is that
vector(length)
is equivalent to
vector(length: length)
and
worker(num)
is equivalent to
vector(num: num)
etc.  Basically, neither length nor num aren't reserved identifiers,
so you can use them for variable names, and if
vector(v) is equivalent to vector(length: v), then
vector(length) should be equivalent to vector(length:length)
or
vector(length + 1) should be equivalent to vector(length: length+1)
static is a keyword that can't start an integral expression, so I guess
it is fine if you issue an expected : diagnostics after it.

In any case, please add a testcase (both C and C++) which covers all these
allowed variants (ideally one testcase) and rejected variants (another
testcase with dg-error).

This is still an easy case, as even the C FE has 2 tokens lookup.
E.g. for OpenMP map clause where
map (always, tofrom: x)
means one thing and
map (always, tofrom, y)
another one (map (tofrom: always, tofrom, y))
I had to do quite ugly things to get around this.

Jakub


Re: Benchmarks of v2 (was Re: [PATCH 0/5] RFC: Overhaul of diagnostics (v2))

2015-10-22 Thread David Malcolm
On Mon, 2015-10-19 at 16:51 +0200, Michael Matz wrote:
> Hi,
> 
> On Fri, 16 Oct 2015, David Malcolm wrote:
> 
> > This fixes much of the bloat seen for influence.i when sending ranges 
> > through for every token.
> 
> Yeah, I think that's on the right track.

Thanks.


> > This was with 8 bits allocated for packed ranges (which is probably 
> > excessive, but it makes debugging easier).
> 
> Probably in the end it should be done similar to how column bits are dealt 
> with, start with a reasonably low number (5 bits?) and increase if 
> necessary and budget allows (budget being column+range < N bits && range < 
> 8 bits, or so; so that range can't consume all of the column bits).

In my latest version, the range_bits is indeed a field of the ordinary
map, defaulting to 5 bits.  Right now it's either 5 or 0 bits for an
ordinary map's range: 5 as the default, with 0 for the case where the
map's column_bits value is also 0, which happens for large values of
location_t (above LINE_MAP_MAX_LOCATION_WITH_COLS), or for maps where
max_column_hint was very high.  I put it into the ordinary_map (rather
than the line_table) to handle those column_bits == 0 cases.  I'm not
sure I want to add any logic beyond that, for fear of over-complicating
things.

I have a messy patch kit with this idea, which bootstraps and passes
regression testing on x86_64; I'm going to tidy it up and post it for
review [the messy version is backed up here:
https://dmalcolm.fedorapeople.org/gcc/2015-10-22/rich-locations/
It's on top of r228618.] 

Here are some benchmark numbers (the final column is for the patch kit
above; control is r228618).

Minimal wallclock time (s) over 10 iterations
  (each experiment's % change is relative to 10 iterations of control
interleaved with that experiment)
  Control  v2 v2+every+token 
v2+packed+rangesv2+packed+ranges+20151021
--  -  -  -  
--  ---
kdecore.cc -g -O0   10.306510.268712 (-0.4%)  10.444528 (+1.9%)  10.398405 
(+1.3%)   10.382135 (+1.3%)
kdecore.cc -g -O1   27.026327.220654 (+0.7%)  27.622676 (+1.3%)  27.548755 
(+1.9%)   27.725537 (+2.5%)
kdecore.cc -g -O2   43.791744.02027 (+0.5%)   44.248477 (+0.8%)  43.843728 
(-0.6%)   44.034842 (-0.6%)
kdecore.cc -g -O3   47.471847.651101 (+0.4%)  48.005495 (+0.8%)  47.57282 
(+0.1%)48.149045 (+1.2%)
kdecore.cc -g -Os   31.678731.802829 (+0.4%)  32.033478 (+0.9%)  31.699171 
(-0.2%)   31.802537 (+0.2%)
empty.c -g -O0   0.012662  0.011932 (-5.8%)   0.013143 (+2.0%)   0.013208 
(+1.5%)0.011506 (+2.0%)
empty.c -g -O1   0.012685  0.012558 (-1.0%)   0.01279 (-2.8%)0.012424 
(+0.4%)0.011212 (-1.2%)
empty.c -g -O2   0.012694  0.012846 (+1.2%)   0.013175 (+2.0%)   0.013176 
(+1.6%)0.011495 (-0.0%)
empty.c -g -O3   0.012654  0.012699 (+0.4%)   0.012792 (+1.6%)   0.013198 
(+0.5%)0.010793 (+1.3%)
empty.c -g -Os   0.013057  0.012766 (-2.2%)   0.012885 (+1.5%)   0.01298 
(-1.6%) 0.011169 (-0.2%)
big-code.c -g -O03.29268   3.325748 (+1.0%)   3.303049 (+0.3%)   3.350572 
(+1.7%)3.352896 (+1.7%)
big-code.c -g -O1   15.701815.765014 (+0.4%)  15.759254 (+0.3%)  15.45175 
(-2.0%)15.454777 (-1.9%)
big-code.c -g -O2   22.575622.620187 (+0.2%)  22.605435 (+0.2%)  22.343609 
(-1.2%)   22.2913 (-1.4%)
big-code.c -g -O3   52.423652.590075 (+0.3%)  52.703835 (+0.5%)  51.9239 
(-1.0%) 51.86898 (-1.1%)
big-code.c -g -Os   21.154 21.253598 (+0.5%)  21.260138 (+0.5%)  20.907407 
(-1.3%)   20.870625 (-1.3%)
influence.i -g -O0   0.148229  0.149518 (+0.9%)   0.156262 (+5.1%)   0.150652 
(+1.0%)0.147663 (+1.4%)
influence.i -g -O1   0.387397  0.38993 (+0.7%)0.396655 (+2.3%)   0.3916 
(+0.8%)  0.388918 (+1.0%)
influence.i -g -O2   0.587514  0.589604 (+0.4%)   0.59651 (+1.4%)0.585223 
(-0.6%)0.583341 (-0.4%)
influence.i -g -O3   1.27356   1.280514 (+0.5%)   1.287596 (+1.0%)   1.273018 
(-0.1%)1.2723 (+0.0%)
influence.i -g -Os   0.526045  0.527579 (+0.3%)   0.535635 (+1.7%)   0.528192 
(+0.2%)0.525308 (+0.3%)


Maximal ggc memory (kb)
  Control  v2   v2+every+token
v2+packed+rangesv2+packed+ranges+20151021
--  -  ---    
--  ---
kdecore.cc -g -O0  650337  654435 (+0.6%)   711775 (+9.4%)659372 
(+1.4%)  657465 (+1.1%)
kdecore.cc -g -O1  931966  940144 (+0.9%)   989384 (+6.2%)954779 
(+2.4%)  952734 (+2.2%)
kdecore.cc -g -O2 1125325  1133514 (+0.7%)  1182384 (+5.1%)   1136459 
(+1.0%) 1134412 (+0.8%)
kdecore.cc -g -O3 1221408  1229596 (+0.7%)  1278658 (+4.7%)   1232688 
(+0.9%) 1230634 (+0.8%)
kdecore.cc -g -Os  867140  871235 (+0.5%)   928700 (+7.1%)884304 
(+2.0%)  874049 (+0.8%)
empty.c -g -O0   1189  1192 (+0.3%) 1193 (+0.3%)  

Re: [OpenACC 11/11] execution tests

2015-10-22 Thread Cesar Philippidis
On 10/22/2015 08:00 AM, Jakub Jelinek wrote:
> On Thu, Oct 22, 2015 at 07:47:01AM -0700, Cesar Philippidis wrote:
>>> But it is unclear from the parsing what from these is allowed:
>>
>> int v, w;
>> ...
>> gang(26)  // equivalent to gang(num:26)
>> gang(v)   // gang(num:v)
>> vector(length: 16)  // vector(length: 16)
>> vector(length: v)  // vector(length: v)
>> vector(16)  // vector(length: 16)
>> vector(v)   // vector(length: v)
>> worker(num: 16)  // worker(num: 16)
>> worker(num: v)   // worker(num: 16)
>> worker(16)  // worker(num: 16)
>> worker(v)   // worker(num: 16)
>> gang(16, 24)  // technically gang(num:16, num:24) is acceptable but it
>>   // should be an error
>> gang(v, w)  // likewise
>> gang(static: 16, num: 5)  // gang(static: 16, num: 5)
>> gang(static: v, num: w)   // gang(static: v, num: w)
>> gang(num: 5, static: 4)   // gang(num: 5, static: 4)
>> gang(num: v, static: w)   // gang(num: v, static: w)
>>
>> Also note that the static argument can accept '*'.
>>
>>> and if the length: or num: part is really optional, then
>>> int length, num;
>>> vector(length)
>>> worker(num)
>>> gang(num, static: 6)
>>> gang(static: 5, num)
>>> should be also accepted (or subset thereof?).
>>
>> Interesting question. The spec is unclear. It defines gang, worker and
>> vector as follows in section 2.7 in the OpenACC 2.0a spec:
>>
>>   gang [( gang-arg-list )]
>>   worker [( [num:] int-expr )]
>>   vector [( [length:] int-expr )]
>>
>> where gang-arg is one of:
>>
>>   [num:] int-expr
>>   static: size-expr
>>
>> and gang-arg-list may have at most one num and one static argument,
>> and where size-expr is one of:
>>
>>   *
>>   int-expr
>>
>> So I've interpreted that as a requirement that length and num must be
>> followed by an int-expr, whatever that is.
> 
> My reading of the above is that
> vector(length)
> is equivalent to
> vector(length: length)
> and
> worker(num)
> is equivalent to
> vector(num: num)
> etc.  Basically, neither length nor num aren't reserved identifiers,
> so you can use them for variable names, and if
> vector(v) is equivalent to vector(length: v), then
> vector(length) should be equivalent to vector(length:length)
> or
> vector(length + 1) should be equivalent to vector(length: length+1)
> static is a keyword that can't start an integral expression, so I guess
> it is fine if you issue an expected : diagnostics after it.

You're correct. I overlooked that 'int length, num' declaration.

> In any case, please add a testcase (both C and C++) which covers all these
> allowed variants (ideally one testcase) and rejected variants (another
> testcase with dg-error).
> 
> This is still an easy case, as even the C FE has 2 tokens lookup.
> E.g. for OpenMP map clause where
> map (always, tofrom: x)
> means one thing and
> map (always, tofrom, y)
> another one (map (tofrom: always, tofrom, y))
> I had to do quite ugly things to get around this.

I'll add more test cases.

Thanks,
Cesar



Re: [PATCH, MIPS, PR/61114] Migrate to reduc_..._scal optabs.

2015-10-22 Thread Alan Lawrence

On closer inspection I think you can also remove this guy (from loongson.md):

(define_insn "reduc_uplus_v8qi"
  [(set (match_operand:V8QI 0 "register_operand" "=f")
(unspec:V8QI [(match_operand:V8QI 1 "register_operand" "f")]
 UNSPEC_LOONGSON_BIADD))]
  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
  "biadd\t%0,%1"
  [(set_attr "type" "fabs")])

as reduc_plus_scal_ handles both signed and unsigned cases.

Cheers, Alan



Re: Constify host-side offload data`

2015-10-22 Thread Ilya Verbin
On Thu, Oct 22, 2015 at 07:35:55 -0700, H.J. Lu wrote:
> On Thu, Oct 22, 2015 at 7:11 AM, Ilya Verbin  wrote:
> > On Wed, Oct 21, 2015 at 10:44:56 -0700, H.J. Lu wrote:
> >> On Wed, Oct 21, 2015 at 10:42 AM, Ilya Verbin  wrote:
> >> > On Wed, Oct 21, 2015 at 10:38:10 -0700, H.J. Lu wrote:
> >> >> On Wed, Oct 21, 2015 at 10:33 AM, Ilya Verbin  wrote:
> >> >> > H.J.,
> >> >> > Maybe linker should print some warning about joining writable + 
> >> >> > nonwritable
> >> >> > sections?  Here is a simple testcase:
> >> >> >
> >> >> > $ cat t1.s
> >> >> > .section ".AAA", "a"
> >> >> > .long 0x12345678
> >> >> > $ cat t2.s
> >> >> > .section ".AAA", "wa"
> >> >> > .long 0x12345678
> >> >> > $ as t1.s -o t1.o
> >> >> > $ as t2.s -o t2.o
> >> >> > $ ld -shared t1.o t2.o
> >> >> > $ ls -lh a.out
> >> >> > 2.1M a.out
> >> >> >
> >> >>
> >> >> Does linker make AAA  writable? If yes, linker does what it
> >> >> is told.
> >> >
> >> > Yes, it makes it writable, but why it also makes this?
> >> >
> >> >   [Nr] Name  Type Address   Offset
> >> >Size  EntSize  Flags  Link  Info  Align
> >> >   [ 0]   NULL   
> >> >     0 0 0
> >> >   [ 1] .hash HASH 00b0  00b0
> >> >0028  0004   A   2 0 8
> >> >   [ 2] .dynsym   DYNSYM   00d8  00d8
> >> >0078  0018   A   3 2 8
> >> >   [ 3] .dynstr   STRTAB   0150  0150
> >> >0019     A   0 0 1
> >> >   [ 4] .AAA  PROGBITS 0169  0169
> >> >0008    WA   0 0 1
> >> >   [ 5] .eh_frame PROGBITS 0178  0178
> >> >     A   0 0 8
> >> >   [ 6] .dynamic  DYNAMIC  00200178  00200178  
> >> > <-- ???
> >> >00b0  0010  WA   3 0 8
> >> >   [ 7] .shstrtab STRTAB     00200380
> >> >0049     0 0 1
> >> >   [ 8] .symtab   SYMTAB     00200228
> >> >0120  0018   9 9 8
> >> >   [ 9] .strtab   STRTAB     00200348
> >> >0038     0 0 1
> >> >
> >>
> >> Linker groups input sections by section name and ors section
> >> flags.
> >
> > Could you please help figure out how this number 0x200178 is calculated?
> > ld -verbose doesn't show anything helpful.  It seems that something goes 
> > wrong
> > during section-to-segment mapping, because when both .AAA have "wa" flags, 
> > we
> > got small binary with 2 LOAD segments:
> >   Type   Offset VirtAddr   PhysAddr
> >  FileSizMemSiz  Flags  Align
> >   LOAD   0x 0x 0x
> >  0x01a8 0x01a8  R  20
> >   LOAD   0x01a8 0x002001a8 0x002001a8
> >  0x00b8 0x00b8  RW 20
> >
> > But when one .AAA has "a" flag, and another .AAA has "wa" flag, we got huge
> > binary with only one big LOAD segment:
> >   Type   Offset VirtAddr   PhysAddr
> >  FileSizMemSiz  Flags  Align
> >   LOAD   0x 0x 0x
> >  0x00200228 0x00200228  RW 20
> >
> > BTW, gold produces small binary in both cases.
> >
> 
> Please open a binutils bug with a testcase.

Done: https://sourceware.org/bugzilla/show_bug.cgi?id=19162

  -- Ilya


Re: RFA/RFC: insns that do not start a source line

2015-10-22 Thread Bernd Schmidt

On 10/22/2015 04:19 PM, Nick Clifton wrote:

It is in cfgrtl.c:fixup_reorder_chain()


   nb = split_edge (e);
   if (!INSN_P (BB_END (nb)))
 BB_END (nb) = emit_insn_after_noloc (gen_nop (), BB_END (nb),
  nb);
   INSN_LOCATION (BB_END (nb)) = e->goto_locus;


That looks rather intentional. That code seems to be trying to make sure 
that we can set a breakpoint anywhere we want by ensuring that an 
instruction with the right location exists.


What's happening is that we enter with an edge
  $6 =  15)>
which has a corresponding goto_locus
  $17 = {file = 0x7fffdd96 "loop-break.c", line = 34, column = 16, 
data = 0x70840420, sysp = false}
which we compare to the location for the jump insn at the end, which is 
however
  $15 = {file = 0x7fffdd96 "loop-break.c", line = 34, column = 8, 
data = 0x70840420, sysp = false}


Further investigation shows that we have enter expand_gimple_cond, which 
calls set_curr_insn_location with the location of the condition (i.e. 
$15), and all insns created by ix86_expand_branch get that location. 
There is code in expand_gimple_cond to switch locations around, but at 
that point true_edge->goto_locus and false_edge->goto_locus are equal, 
and contain something different.


So I'm not entirely sure yet what's supposed to happen here, but I think 
the problem could well be in the expansion phase. The proposed patch 
can't be right, because the nop in question is apparently only generated 
for the purpose of holding a location.



Bernd


Re: RFA/RFC: insns that do not start a source line

2015-10-22 Thread Bernd Schmidt

On 10/22/2015 05:08 PM, Bernd Schmidt wrote:


So I'm not entirely sure yet what's supposed to happen here, but I think
the problem could well be in the expansion phase.


Forgot to mention another possibility that crossed my mind: don't 
compare locations for equality in fixup_reorder_chain, instead ignore 
column numbers.



Bernd



more accurate omp in fortran

2015-10-22 Thread Cesar Philippidis
Currently, for certain omp and oacc errors the fortran will inaccurately
report exactly where in the omp/acc construct the error has occurred. E.g.

   !$acc parallel copy (i) copy (i) copy (j)
   1
Error: Symbol ‘i’ present on multiple clauses at (1)

instead of

   !$acc parallel copy (i) copy (i) copy (j)
1
Error: Symbol ‘i’ present on multiple clauses at (1)

The problem here is how the front end uses the locus for the construct
and not the individual clause. As a result that diagnostic pointer
points to the end of the construct.

This patch teaches gfc_resolve_omp_clauses how to use the locus of each
individual clause instead of the construct when reporting errors
involving OMP_LIST_ clauses (which are typically clauses involving
variables). It's still not perfect, but it does improve the quality of
the error reporting a little. In particular, in openacc, other compilers
are somewhat lenient in allowing variables to appear in multiple
clauses, e.g. copyin (foo) copyout (foo), but this is clearly forbidden
by the spec. I received some bug reports complaining that gfortran's
errors aren't accurate.

I've also split off the check for variables appearing in multiple
clauses into a separate function. It's a little overkill for trunk right
now, but it is used quite a bit in gomp4 for oacc declare.

I've tested these changes on x86_64. Is this ok for trunk?

Cesar


2015-10-22  Cesar Philippidis  

	gcc/fortran/
	* gfortran.h (gfc_omp_namespace): Add locus where member.
	* openmp.c (gfc_match_omp_variable_list): Set where for each list
	item found.
	(resolve_omp_duplicate_list): New function.
	(oacc_compatible_clauses): Delete.
	(resolve_omp_clauses): Remove where argument and use the where
	gfc_omp_namespace member when reporting errors.  Use
	resolve_omp_duplicate_list to check for variables appearing in
	mulitple clauses.
	(resolve_omp_do): Update call to resolve_omp_clauses.
	(resolve_oacc_loop): Likewise.
	(gfc_resolve_oacc_directive): Likewise.
	(gfc_resolve_omp_directive): Likewise.
	(gfc_resolve_omp_declare_simd): Likewise.

	gcc/testsuite/
	* gfortran.dg/gomp/intentin1.f90: Adjust copyprivate warning.

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index b2894cc..93adb7b 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -1123,6 +1123,7 @@ typedef struct gfc_omp_namelist
 } u;
   struct gfc_omp_namelist_udr *udr;
   struct gfc_omp_namelist *next;
+  locus where;
 }
 gfc_omp_namelist;
 
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 3c12d8e..56a95d4 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -244,6 +244,7 @@ gfc_match_omp_variable_list (const char *str, gfc_omp_namelist **list,
 	}
 	  tail->sym = sym;
 	  tail->expr = expr;
+	  tail->where = cur_loc;
 	  goto next_item;
 	case MATCH_NO:
 	  break;
@@ -278,6 +279,7 @@ gfc_match_omp_variable_list (const char *str, gfc_omp_namelist **list,
 	  tail = tail->next;
 	}
 	  tail->sym = sym;
+	  tail->where = cur_loc;
 	}
 
 next_item:
@@ -2832,36 +2834,47 @@ resolve_omp_udr_clause (gfc_omp_namelist *n, gfc_namespace *ns,
   return copy;
 }
 
-/* Returns true if clause in list 'list' is compatible with any of
-   of the clauses in lists [0..list-1].  E.g., a reduction variable may
-   appear in both reduction and private clauses, so this function
-   will return true in this case.  */
+/* Check if a variable appears in multiple clauses.  */
 
-static bool
-oacc_compatible_clauses (gfc_omp_clauses *clauses, int list,
-			   gfc_symbol *sym, bool openacc)
+static void
+resolve_omp_duplicate_list (gfc_omp_namelist *clause_list, bool openacc,
+			int list)
 {
   gfc_omp_namelist *n;
+  const char *error_msg = "Symbol %qs present on multiple clauses at %L";
 
-  if (!openacc)
-return false;
+  /* OpenACC reduction clauses are compatible with everything.  We only
+ need to check if a reduction variable is used more than once.  */
+  if (openacc && list == OMP_LIST_REDUCTION)
+{
+  hash_set reductions;
 
-  if (list != OMP_LIST_REDUCTION)
-return false;
+  for (n = clause_list; n; n = n->next)
+	{
+	  if (reductions.contains (n->sym))
+	gfc_error (error_msg, n->sym->name, &n->where);
+	  else
+	reductions.add (n->sym);
+	}
 
-  for (n = clauses->lists[OMP_LIST_FIRST]; n; n = n->next)
-if (n->sym == sym)
-  return true;
+  return;
+}
 
-  return false;
+  /* Ensure that variables are only used in one clause.  */
+  for (n = clause_list; n; n = n->next)
+{
+  if (n->sym->mark)
+	gfc_error (error_msg, n->sym->name, &n->where);
+  else
+	n->sym->mark = 1;
+}
 }
 
 /* OpenMP directive resolving routines.  */
 
 static void
-resolve_omp_clauses (gfc_code *code, locus *where,
-		 gfc_omp_clauses *omp_clauses, gfc_namespace *ns,
-		 bool openacc = false)
+resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses,
+		 gfc_namespa

[hsa] Also copy distribute pre-body

2015-10-22 Thread Martin Jambor
Hi,

I had to interrupt my porting to OpenMP 4.5 to fix this bug.  We were
not copying the distribute loop pre-body to before target, thus
setting the grid size of quite a few kernels to bogus values.  Fixed
thusly and committed to the branch.

Thanks,

Martin


2015-10-22  Martin Jambor  

* omp-low.c (process_kernel_body_copy): Also copy pre-bodies of
distribute and inner loop to before target.
(attempt_target_gridification): Do not copy inner loop pre-body.
---
 gcc/omp-low.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index a094163..b568175 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -12987,6 +12987,9 @@ process_kernel_body_copy (gimple_seq seq, 
gimple_stmt_iterator *dst,
   gomp_for *dist;
   if ((dist = dyn_cast  (stmt)))
{
+  gimple_seq prebody = gimple_omp_for_pre_body (dist);
+  if (prebody)
+copy_leading_local_assignments (prebody, dst, tgt_bind, wi);
  gimple_omp_for_set_kernel_phony (dist, true);
  stmt = copy_leading_local_assignments (gimple_omp_body (dist), dst,
 tgt_bind, wi);
@@ -12999,6 +13002,10 @@ process_kernel_body_copy (gimple_seq seq, 
gimple_stmt_iterator *dst,
 tgt_bind, wi);
   gomp_for *inner_loop = as_a  (stmt);
   gimple_omp_for_set_kind (inner_loop, GF_OMP_FOR_KIND_KERNEL_BODY);
+  gimple_seq prebody = gimple_omp_for_pre_body (inner_loop);
+  if (prebody)
+copy_leading_local_assignments (prebody, dst, tgt_bind, wi);
+
   return inner_loop;
 }
 
@@ -13046,11 +13053,6 @@ attempt_target_gridification (gomp_target *target, 
gimple_stmt_iterator *gsi,
 (gimple_bind_body_ptr (as_a  (gimple_omp_body (target))),
  gpukernel);
 
-  /* Copy loop pre-body before target: */
-  gimple_seq prebody = gimple_omp_for_pre_body (inner_loop);
-  if (prebody)
-copy_leading_local_assignments (prebody, gsi, tgt_bind, &wi);
-
   target->kernel_group_size = group_size;
   size_t collapse = inner_loop->collapse;
   target->kernel_collapse = collapse;
-- 
2.6.0



Change behavior of -fsched-verbose option

2015-10-22 Thread Nikolai Bozhenov

Hi!

Currently -fsched-verbose option redirects debugging dumps to stderr
if there is no dump_file for the current pass. It would be fine if
there were the only scheduling pass. But for example for AArch64
there are 3 scheduling passes in the default pipeline: sched1,
fusion and sched2. So, when passing options like
-fsched-verbose=7 -fdump-rtl-sched1 to GCC I get a neat dump for
sched1 in the file and a mess of fusion/sched2 logs in the console.

In fact, currently there's no way to tell GCC that I want extremely
verbose logs for sched1 and I want no logs for all other passes.
Especially to the console.

I suggest disabling such redirection in the scheduler and omitting
debugging output for passes without dump_file. I believe a better
way to redirect printing to the stderr is to use options like
-fdump-rtl-sched=stderr. The attached patch implements the
suggestion.

Anyway, the old behavior may be reproduced with options
-fsched-verbose=7 -fdump-rtl-sched1 -fdump-rtl-{sched_fusion,sched2}=stderr
if it is really necessary.

The patch has been bootstrapped and regtested on x86_64.

Thanks,
Nikolai
2015-10-22  Nikolai Bozhenov  

	* haifa-sched.c (setup_sched_dump): Don't redirect output to stderr.
	* common.opt (-fsched-verbose): Set default value to 1.
	* invoke.texi (-fsched-verbose): Update the option's description.

diff --git a/gcc/common.opt b/gcc/common.opt
index 224d3ad..5c23d29 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1973,7 +1973,7 @@ Common Report Var(flag_schedule_speculative_load_dangerous) Optimization
 Allow speculative motion of more loads
 
 fsched-verbose=
-Common RejectNegative Joined UInteger Var(sched_verbose_param)
+Common RejectNegative Joined UInteger Var(sched_verbose_param) Init(1)
 -fsched-verbose=	Set the verbosity level of the scheduler
 
 fsched2-use-superblocks
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 027ce96..d3cb88c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -7403,12 +7403,7 @@ The @var{number} should be different for every file you compile.
 @item -fsched-verbose=@var{n}
 @opindex fsched-verbose
 On targets that use instruction scheduling, this option controls the
-amount of debugging output the scheduler prints.  This information is
-written to standard error, unless @option{-fdump-rtl-sched1} or
-@option{-fdump-rtl-sched2} is specified, in which case it is output
-to the usual dump listing file, @file{.sched1} or @file{.sched2}
-respectively.  However for @var{n} greater than nine, the output is
-always printed to standard error.
+amount of debugging output the scheduler prints to the dump files.
 
 For @var{n} greater than zero, @option{-fsched-verbose} outputs the
 same information as @option{-fdump-rtl-sched1} and @option{-fdump-rtl-sched2}.
diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c
index 46751fe..32506cb 100644
--- a/gcc/haifa-sched.c
+++ b/gcc/haifa-sched.c
@@ -206,17 +206,14 @@ static int modulo_last_stage;
 
 /* sched-verbose controls the amount of debugging output the
scheduler prints.  It is controlled by -fsched-verbose=N:
-   N>0 and no -DSR : the output is directed to stderr.
-   N>=10 will direct the printouts to stderr (regardless of -dSR).
-   N=1: same as -dSR.
+   N=0: no debugging output.
+   N=1: default value.
N=2: bb's probabilities, detailed ready list info, unit/insn info.
N=3: rtl at abort point, control-flow, regions info.
N=5: dependences info.  */
-
 int sched_verbose = 0;
 
-/* Debugging file.  All printouts are sent to dump, which is always set,
-   either to stderr, or to the dump listing file (-dRS).  */
+/* Debugging file.  All printouts are sent to dump. */
 FILE *sched_dump = 0;
 
 /* This is a placeholder for the scheduler parameters common
@@ -7222,17 +7219,14 @@ set_priorities (rtx_insn *head, rtx_insn *tail)
   return n_insn;
 }
 
-/* Set dump and sched_verbose for the desired debugging output.  If no
-   dump-file was specified, but -fsched-verbose=N (any N), print to stderr.
-   For -fsched-verbose=N, N>=10, print everything to stderr.  */
+/* Set sched_dump and sched_verbose for the desired debugging output. */
 void
 setup_sched_dump (void)
 {
   sched_verbose = sched_verbose_param;
-  if (sched_verbose_param == 0 && dump_file)
-sched_verbose = 1;
-  sched_dump = ((sched_verbose_param >= 10 || !dump_file)
-		? stderr : dump_file);
+  sched_dump = dump_file;
+  if (!dump_file)
+sched_verbose = 0;
 }
 
 /* Allocate data for register pressure sensitive scheduling.  */


Re: [vec-cmp, patch 1/6] Add optabs for vector comparison

2015-10-22 Thread Jeff Law

On 10/22/2015 04:35 AM, Ilya Enkovich wrote:

2015-10-21 20:25 GMT+03:00 Jeff Law :

On 10/08/2015 08:52 AM, Ilya Enkovich wrote:


Hi,

This series introduces autogeneration of vector comparison and its support
on i386 target.  It lets comparison statements to be vectorized into vector
comparison instead of VEC_COND_EXPR.  This allows to avoid some restrictions
implied by boolean patterns.  This series applies on top of bolean vectors
series [1].

This patch introduces optabs for vector comparison.

[1] https://gcc.gnu.org/ml/gcc-patches/2015-10/msg00215.html

Thanks,
Ilya
--
gcc/

2015-10-08  Ilya Enkovich  

 * expr.c (do_store_flag): Use expand_vec_cmp_expr for mask
results.
 * optabs-query.h (get_vec_cmp_icode): New.
 * optabs-tree.c (expand_vec_cmp_expr_p): New.
 * optabs-tree.h (expand_vec_cmp_expr_p): New.
 * optabs.c (vector_compare_rtx): Add OPNO arg.
 (expand_vec_cond_expr): Adjust to vector_compare_rtx change.
 (expand_vec_cmp_expr): New.
 * optabs.def (vec_cmp_optab): New.
 (vec_cmpu_optab): New.
 * optabs.h (expand_vec_cmp_expr): New.
 * tree-vect-generic.c (expand_vector_comparison): Add vector
 comparison optabs check.


diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index 3b03338..aa863cf 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -320,6 +320,19 @@ supportable_convert_operation (enum tree_code code,
 return false;
   }

+/* Return TRUE if appropriate vector insn is available
+   for vector comparison expr with vector type VALUE_TYPE
+   and resulting mask with MASK_TYPE.  */
+
+bool
+expand_vec_cmp_expr_p (tree value_type, tree mask_type)
+{
+  enum insn_code icode = get_vec_cmp_icode (TYPE_MODE (value_type),
+   TYPE_MODE (mask_type),
+   TYPE_UNSIGNED (value_type));
+  return (icode != CODE_FOR_nothing);
+}
+


Nothing inherently wrong with the code, but it seems like it's in the wrong
place.  Why optabs-tree rather than optabs-query?


Because it uses tree type for arguments. There is no single tree usage
in optabs-query.c. I think expand_vec_cond_expr_p is in optabs-tree
for the same reason.
Note that expand_vec_cond_expr_p doesn't rely on enum insn code.  Well, 
it relies on enum insn code being an integer and CODE_FOR_nothing always 
having the value zero, which is probably worse.


We should clean both of these up so that:

  1. We don't need enum insn_code in optabs-tree
  2. We don't implicitly rely on CODE_FOR_nothing == 0

It may be as simple as a adding a predicate function to optabs-query 
that returns true/false if there's a suitable icode, then using that 
predicate in optabs-tree.


jeff




Re: Change behavior of -fsched-verbose option

2015-10-22 Thread Bernd Schmidt

On 10/22/2015 05:38 PM, Nikolai Bozhenov wrote:


Currently -fsched-verbose option redirects debugging dumps to stderr
if there is no dump_file for the current pass. It would be fine if
there were the only scheduling pass. But for example for AArch64
there are 3 scheduling passes in the default pipeline: sched1,
fusion and sched2. So, when passing options like
-fsched-verbose=7 -fdump-rtl-sched1 to GCC I get a neat dump for
sched1 in the file and a mess of fusion/sched2 logs in the console.



* haifa-sched.c (setup_sched_dump): Don't redirect output to stderr.
* common.opt (-fsched-verbose): Set default value to 1.
* invoke.texi (-fsched-verbose): Update the option's description.


That looks sensible, I agree that dumping to stderr does not make much 
sense. When in the debugger you can still set sched_dump to stderr if 
you want.



+  if (!dump_file)
+sched_verbose = 0;


Do you even need that? Otherwise, patch OK.


Bernd


Commit: MSP430: Pass silicon errata options on to the assembler

2015-10-22 Thread Nick Clifton
Hi Guys,

  I am checking in the patch below to allow gcc to pass the new MSP430
  -msilicon-errata and -msilicon-errata-warn option on to the
  assembler.

Cheers
  Nick

gcc/ChangeLog
2015-10-22  Nick Clifton  

* config/msp430/msp430.opt: Add -msilicon-errata and
-msilicon-errata-warn.
* config/msp430/msp430.h (ASM_SPEC): Pass new options on to
assembler.
* doc/invoke.texi: Document new options.

Index: gcc/config/msp430/msp430.h
===
--- gcc/config/msp430/msp430.h  (revision 229176)
+++ gcc/config/msp430/msp430.h  (working copy)
@@ -56,6 +56,8 @@
   "%{mrelax=-mQ} " /* Pass the relax option on to the assembler.  */ \
   "%{mlarge:-ml} " /* Tell the assembler if we are building for the LARGE 
pointer model.  */ \
   "%{!msim:-md} %{msim:%{mlarge:-md}} " /* Copy data from ROM to RAM if 
necessary.  */ \
+  "%{msilicon-errata=*:-msilicon-errata=%*} " /* Pass on -msilicon-errata.  */ 
\
+  "%{msilicon-errata-warn=*:-msilicon-errata-warn=%*} " /* Pass on 
-msilicon-errata-warn.  */ \
   "%{ffunction-sections:-gdwarf-sections} " /* If function sections are being 
created then create DWARF line number sections as well.  */
 
 /* Enable linker section garbage collection by default, unless we
Index: gcc/config/msp430/msp430.opt
===
--- gcc/config/msp430/msp430.opt(revision 229176)
+++ gcc/config/msp430/msp430.opt(working copy)
@@ -80,3 +80,11 @@
 
 EnumValue
 Enum(msp430_regions) String(upper) Value(UPPER)
+
+msilicon-errata=
+Target Joined RejectNegative Report ToLower
+Passes on a request to the assembler to enable fixes for various silicon errata
+
+msilicon-errata-warn=
+Target Joined RejectNegative Report ToLower
+Passes on a request to the assembler to warn about various silicon errata
Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 229176)
+++ gcc/doc/invoke.texi (working copy)
@@ -841,6 +841,7 @@
 @emph{MSP430 Options}
 @gccoptlist{-msim -masm-hex -mmcu= -mcpu= -mlarge -msmall -mrelax @gol
 -mcode-region= -mdata-region= @gol
+-msilicon-errata= -msilicon-errata-warn= @gol
 -mhwmult= -minrt}
 
 @emph{NDS32 Options}
@@ -18449,6 +18450,16 @@
 linker script and how it assigns the standard sections (.text, .data
 etc) to the memory regions.
 
+@item -msilicon-errata=
+@opindex msilicon-errata
+This option passes on a request to assembler to enable the fixes for
+the names silicon errata.
+
+@item -msilicon-errata-warn=
+@opindex msilicon-errata-warn
+This option passes on a request to the assembler to enable warning
+messages when a silicon errata might need to be applied.
+
 @end table
 
 @node NDS32 Options


  1   2   >