[patch, fortran] Fix PR 83540

2017-12-26 Thread Thomas Koenig

Hello world,

this rather self-explanatory patch makes sure we don't get an error
using reallocation on assignment for inlining matmul when
we don't have reallocation on assignment.

Regression-tested. OK for trunk?

Regards

Thomas

2017-12-25  Thomas Koenig  

PR fortran/83540
* frontend-passes.c (create_var): If an array to be created
has unknown size and -fno-realloc-lhs is in effect,
return NULL.

2017-12-25  Thomas Koenig  

PR fortran/83540
* gfortran.dg/inline_matmul_20.f90: New test.
Index: frontend-passes.c
===
--- frontend-passes.c	(Revision 255788)
+++ frontend-passes.c	(Arbeitskopie)
@@ -720,6 +720,11 @@ create_var (gfc_expr * e, const char *vname)
   if (e->expr_type == EXPR_CONSTANT || is_fe_temp (e))
 return gfc_copy_expr (e);
 
+  /* Creation of an array of unknown size requires realloc on assignment.
+ If that is not possible, just return NULL.  */
+  if (flag_realloc_lhs == 0 && e->rank > 0 && e->shape == NULL)
+return NULL;
+
   ns = insert_block ();
 
   if (vname)
! { dg-do  run }
! { dg-additional-options "-fno-realloc-lhs -ffrontend-optimize" }
! This used to segfault at runtime.
! Original test case by Harald Anlauf.
program gfcbug142
  implicit none
  real, allocatable :: b(:,:)
  integer :: n = 5
  character(len=20) :: line
  allocate (b(n,n))
  call random_number (b)
  write (unit=line,fmt='(2I5)') shape (matmul (b, transpose (b)))
  if (line /= '55') call abort
end program gfcbug142


[PATCH, PR82391] Fold acc_on_device with const arg

2017-12-26 Thread Tom de Vries

Hi,

the openacc standard states: If the acc_on_device routine has
a compile-time constant argument, it evaluates at compile time to a 
constant.


The purpose of this is to remove non-applicable device-specific code 
during compilation.  In the case of asm insns which are device-specific, 
removal is even needed to be able to compile for host.


When optimizing, the compiler complies with this requirement, through 
gimple_fold_builtin_acc_on_device and following optimizations. But that 
doesn't work at -O0.


Consequenly, a test-case like f.i. loop-auto-1.c that has 
device-specific asm insns:

...
#pragma acc routine seq
static int __attribute__((noinline)) place ()
{
  int r = 0;

  if (acc_on_device (acc_device_nvidia))
{
  int g = 0, w = 0, v = 0;

  __asm__ volatile ("mov.u32 %0,%%ctaid.x;" : "=r" (g));
  __asm__ volatile ("mov.u32 %0,%%tid.y;" : "=r" (w));
  __asm__ volatile ("mov.u32 %0,%%tid.x;" : "=r" (v));
  r = (g << 16) | (w << 8) | v;
}
  return r;
}
...
skips -O0:
...
/* This code uses nvptx inline assembly guarded with acc_on_device,
   which is not optimized away at -O0, and then confuses the target
   assembler.
   { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */ 


...


This patch adds folding of acc_on_device with constant argument at -O0. 
This folding is done by fold_builtin_acc_on_device_cst_arg during 
pass_oacc_device_lower, which also propagates the folded value to it's 
uses, which allows TODO_cleanup_cfg to remove the dead code.


This solution works fine for C, but for C++ things are a bit more 
complicated. In C, the 'int acc_on_device (acc_device_t)' maps onto the 
'int __builtin_acc_on_device (int)', but for C++ that's not the case. 
The current solution for that problem is an inline function in 
openacc.h, but at -O0 that adds too much indirection to still be able to 
remove the dead code. The easiest solution is:

...
#define acc_on_device(dev) __builtin_acc_on_device ((int)dev)
...
but that's not strictly compliant with the openacc standard, which 
requires an openacc interface function 'int 
acc_on_device(acc_device_t)', not a macro.
So we end up with a kludge in oacc_xform_acc_on_device that maps the 
openacc interface function acc_on_device onto the builtin function.



Bootstrapped and reg-tested on x86_64.

Build and reg-tested for x86_64 with nvptx accelerator.

OK for trunk?

Thanks,
- Tom
Fold acc_on_device with const arg

2017-12-22  Tom de Vries  

	PR libgomp/82391
	* omp-offload.c (fold_builtin_acc_on_device_cst_arg)
	(oacc_xform_acc_on_device, oacc_device_lower_non_offloaded): New
	function.
	(execute_oacc_device_lower): Call oacc_device_lower_non_offloaded.
	Call oacc_xform_acc_on_device.

	* openacc.h [__cplusplus] (acc_on_device (int)): Remove.
	[__cplusplus] (acc_on_device (acc_device_t)): Remove definition, and
	declare instead with __builtin_acc_on_device attributes.
	* testsuite/libgomp.oacc-c-c++-common/acc-on-device-4.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Remove int casts
	from args of acc_on_device calls.
	* testsuite/libgomp.oacc-c-c++-common/gang-static-2.c: Remove skip for
	-O0.
	* testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-dim-default.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-g-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-g-2.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-gwv-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-v-2.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-v-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-w-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-wv-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/routine-g-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/routine-gwv-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/routine-v-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/routine-w-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/routine-wv-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/routine-wv-2.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/tile-1.c: Same.

---
 gcc/omp-offload.c  | 121 -
 libgomp/openacc.h  |  14 +--
 .../libgomp.oacc-c-c++-common/acc-on-device-4.c|  18 +++
 .../libgomp.oacc-c-c++-common/gang-static-2.c  |   3 -
 .../libgomp.oacc-c-c++-common/loop-auto-1.c|   4 -
 .../libgomp.oacc-c-c++-common/loop-dim-default.c   |   3 -
 .../testsuite/libgomp.oacc-c-c++-common/loop-g-1.c |   4 -
 .../testsuite/libgomp.oacc-c-c++-common/loop-g-2.c |   4 -
 .../libgomp.oacc-c-c++-common/loop-gwv-1.c |   4 -
 .../libgomp.oacc-

Re: [PATCH] sel-sched: fix zero-usefulness case in sel_rank_for_schedule (PR 83513)

2017-12-26 Thread Andrey Belevantsev

On 25.12.2017 19:47, Alexander Monakov wrote:

Hello,

we need the following follow-up fix for priority comparison in
sel_rank_for_schedule as demonstrated by PR 83513.  Checked on
x86_64 by running a bootstrap and also checking for no regressions in
make -k check-gcc 
RUNTESTFLAGS="--target_board=unix/-fselective-scheduling/-fschedule-insns"

OK to apply?

Yes.

Andrey


PR rtl-optimization/83513
* sel-sched.c (sel_rank_for_schedule): Order by non-zero usefulness
before priority comparison.

diff --git a/gcc/sel-sched.c b/gcc/sel-sched.c
index c1be0136551..be3813717ba 100644
--- a/gcc/sel-sched.c
+++ b/gcc/sel-sched.c
@@ -3396,17 +3396,22 @@ sel_rank_for_schedule (const void *x, const void *y)
else if (control_flow_insn_p (tmp2_insn) && !control_flow_insn_p (tmp_insn))
  return 1;

+  /* Prefer an expr with non-zero usefulness.  */
+  int u1 = EXPR_USEFULNESS (tmp), u2 = EXPR_USEFULNESS (tmp2);
+
+  if (u1 == 0)
+{
+  if (u2 == 0)
+u1 = u2 = 1;
+  else
+return 1;
+}
+  else if (u2 == 0)
+return -1;
+
/* Prefer an expr with greater priority.  */
-  if (EXPR_USEFULNESS (tmp) != 0 || EXPR_USEFULNESS (tmp2) != 0)
-{
-  int p2 = EXPR_PRIORITY (tmp2) + EXPR_PRIORITY_ADJ (tmp2),
-  p1 = EXPR_PRIORITY (tmp) + EXPR_PRIORITY_ADJ (tmp);
-
-  val = p2 * EXPR_USEFULNESS (tmp2) - p1 * EXPR_USEFULNESS (tmp);
-}
-  else
-val = EXPR_PRIORITY (tmp2) - EXPR_PRIORITY (tmp)
- + EXPR_PRIORITY_ADJ (tmp2) - EXPR_PRIORITY_ADJ (tmp);
+  val = (u2 * (EXPR_PRIORITY (tmp2) + EXPR_PRIORITY_ADJ (tmp2))
+ - u1 * (EXPR_PRIORITY (tmp) + EXPR_PRIORITY_ADJ (tmp)));
if (val)
  return val;






Re: [Patch, fortran] PR83076 - [8 Regression] ICE in gfc_deallocate_scalar_with_status, at fortran/trans.c:1598

2017-12-26 Thread Paul Richard Thomas
Hi All,

This is a complete rework of the patch and of the original mechanism
for adding caf token fields and finding them.

In this patch, the token fields are added to the derived types after
all the components have been resolved. This is done so that all the
tokens appear at the very end of the derived type, including the
hidden string lengths. This avoids the present situation, where the
token appears immediately after its associated component such that the
the derived types are not compatible with modules or libraries
compiled without -fcoarray selected. All trans-types has to do now is
to find the component and have the component token field point to its
backend_decl. PR83319 is fixed by unconditionally adding the token
field to the descriptor, when -fcoarray=lib whatever the value of
codimen.

This is something of a belt-and-braces approach, in that the token
fields will sometimes be added when not needed. However, it is better
that than the ICEs that occur when they are missing.

Bootstrapped and regtested on FC23/x86_64 - OK for trunk and 7-branch?

Paul

2017-12-26  Paul Thomas  

PR fortran/83076
* resolve.c (resolve_fl_derived0): Add caf_token fields for
allocatable and pointer scalars, when -fcoarray selected.
* trans-types.c (gfc_copy_dt_decls_ifequal): Copy the token
field as well as the backend_decl.
(gfc_get_derived_type): Flag GFC_FCOARRAY_LIB for module
derived types that are not vtypes. Components with caf_token
attribute are pvoid types. For a component requiring it, find
the caf_token field and have the component token field point to
its backend_decl.

PR fortran/83319
*trans-types.c (gfc_get_array_descriptor_base): Add the token
field to the descriptor even when codimen not set.


2017-12-26  Paul Thomas  

PR fortran/83076
* gfortran.dg/coarray_45.f90 : New test.

PR fortran/83319
* gfortran.dg/coarray_46.f90 : New test.


On 3 December 2017 at 23:48, Dominique d'Humières
 wrote:
> Dear Paul,
>
>> Bootstrapped and regtested on FC23/x86_64 - OK for trunk?
>
> See my comment 7 in the PR.
>
> Dominique
>



-- 
"If you can't explain it simply, you don't understand it well enough"
- Albert Einstein
Index: gcc/fortran/gfortran.h
===
*** gcc/fortran/gfortran.h  (revision 256000)
--- gcc/fortran/gfortran.h  (working copy)
*** typedef struct
*** 870,876 
unsigned alloc_comp:1, pointer_comp:1, proc_pointer_comp:1,
   private_comp:1, zero_comp:1, coarray_comp:1, lock_comp:1,
   event_comp:1, defined_assign_comp:1, unlimited_polymorphic:1,
!  has_dtio_procs:1;
  
/* This is a temporary selector for SELECT TYPE or an associate
   variable for SELECT_TYPE or ASSOCIATE.  */
--- 870,876 
unsigned alloc_comp:1, pointer_comp:1, proc_pointer_comp:1,
   private_comp:1, zero_comp:1, coarray_comp:1, lock_comp:1,
   event_comp:1, defined_assign_comp:1, unlimited_polymorphic:1,
!  has_dtio_procs:1, caf_token:1;
  
/* This is a temporary selector for SELECT TYPE or an associate
   variable for SELECT_TYPE or ASSOCIATE.  */
Index: gcc/fortran/resolve.c
===
*** gcc/fortran/resolve.c   (revision 256000)
--- gcc/fortran/resolve.c   (working copy)
*** resolve_fl_derived0 (gfc_symbol *sym)
*** 13992,13997 
--- 13992,14022 
if (!success)
  return false;
  
+   /* Now add the caf token field, where needed.  */
+   if (flag_coarray != GFC_FCOARRAY_NONE
+   && !sym->attr.is_class && !sym->attr.vtype)
+ {
+   for (c = sym->components; c; c = c->next)
+   if (!c->attr.dimension && !c->attr.codimension
+   && (c->attr.allocatable || c->attr.pointer))
+ {
+   char name[GFC_MAX_SYMBOL_LEN+9];
+   gfc_component *token;
+   sprintf (name, "_caf_%s", c->name);
+   token = gfc_find_component (sym, name, true, true, NULL);
+   if (token == NULL)
+ {
+   if (!gfc_add_component (sym, name, &token))
+ return false;
+   token->ts.type = BT_VOID;
+   token->ts.kind = gfc_default_integer_kind;
+   token->attr.access = ACCESS_PRIVATE;
+   token->attr.artificial = 1;
+   token->attr.caf_token = 1;
+ }
+ }
+ }
+ 
check_defined_assignments (sym);
  
if (!sym->attr.defined_assign_comp && super_type)
Index: gcc/fortran/trans-types.c
===
*** gcc/fortran/trans-types.c   (revision 256000)
--- gcc/fortran/trans-types.c   (working copy)
*** gfc_get_array_descriptor_base (int dimen
*** 1837,1843 
TREE_NO_WARNING (decl) = 1;
  }
  
!   if (flag_coarray == GFC_FCOARRAY_LIB && codimen)
  {
decl = gfc_add_field_to_struc

Re: [patch, fortran] Fix PR 83540

2017-12-26 Thread Paul Richard Thomas
OK - thanks for the patch.

Paul


On 26 December 2017 at 12:12, Thomas Koenig  wrote:
> Hello world,
>
> this rather self-explanatory patch makes sure we don't get an error
> using reallocation on assignment for inlining matmul when
> we don't have reallocation on assignment.
>
> Regression-tested. OK for trunk?
>
> Regards
>
> Thomas
>
> 2017-12-25  Thomas Koenig  
>
> PR fortran/83540
> * frontend-passes.c (create_var): If an array to be created
> has unknown size and -fno-realloc-lhs is in effect,
> return NULL.
>
> 2017-12-25  Thomas Koenig  
>
> PR fortran/83540
> * gfortran.dg/inline_matmul_20.f90: New test.



-- 
"If you can't explain it simply, you don't understand it well enough"
- Albert Einstein


[testsuite, committed] Use relative line number in unroll-5.c

2017-12-26 Thread Tom de Vries

[ was: Re: [C/C++] Add support for #pragma GCC unroll v3 ]

On 11/25/2017 11:15 AM, Eric Botcazou wrote:

Index: testsuite/c-c++-common/unroll-5.c
===
--- testsuite/c-c++-common/unroll-5.c   (revision 0)
+++ testsuite/c-c++-common/unroll-5.c   (working copy)
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+
+extern void bar (int);
+
+int j;
+
+void test (void)
+{
+  #pragma GCC unroll 4+4
+  for (unsigned long i = 1; i <= 8; ++i)
+bar(i);
+
+  #pragma GCC unroll -1/* { dg-error "requires an assignment-expression 
that evaluates to a non-negative integral constant less than or equal to" } */
+  for (unsigned long i = 1; i <= 8; ++i)
+bar(i);
+
+  #pragma GCC unroll 200   /* { dg-error "requires an 
assignment-expression that evaluates to a non-negative integral constant less than or 
equal to" } */
+  for (unsigned long i = 1; i <= 8; ++i)
+bar(i);
+
+  #pragma GCC unroll j /* { dg-error "requires an assignment-expression that 
evaluates to a non-negative integral constant less than or equal to" } */
+/* { dg-error "cannot appear in a constant-expression|is not usable 
in a constant expression" "" { target c++ } 21 } */
+  for (unsigned long i = 1; i <= 8; ++i)
+bar(i);
+
+  #pragma GCC unroll  4.2  /* { dg-error "requires an assignment-expression 
that evaluates to a non-negative integral constant less than or equal to" } */
+  for (unsigned long i = 1; i <= 8; ++i)
+bar(i);
+}


Hi,

this patch changes the absolute line number into a relative one.

Tested on x86_64 and committed.

Thanks,
- Tom
Use relative line number in unroll-5.c

2017-12-26  Tom de Vries  

	* c-c++-common/unroll-5.c: Use relative line number.

---
 gcc/testsuite/c-c++-common/unroll-5.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/c-c++-common/unroll-5.c b/gcc/testsuite/c-c++-common/unroll-5.c
index 754f3b1..b728066 100644
--- a/gcc/testsuite/c-c++-common/unroll-5.c
+++ b/gcc/testsuite/c-c++-common/unroll-5.c
@@ -19,7 +19,7 @@ void test (void)
 bar(i);
 
   #pragma GCC unroll j	/* { dg-error "requires an assignment-expression that evaluates to a non-negative integral constant less than" } */
-/* { dg-error "cannot appear in a constant-expression|is not usable in a constant expression" "" { target c++ } 21 } */
+/* { dg-error "cannot appear in a constant-expression|is not usable in a constant expression" "" { target c++ } .-1 } */
   for (unsigned long i = 1; i <= 8; ++i)
 bar(i);