Re: [PATCH] fortran: Factor the evaluation of MINLOCK/MAXLOC's BACK argument

2024-07-12 Thread Mikael Morin

Le 11/07/2024 à 22:49, Harald Anlauf a écrit :

Hi Mikael,

Am 11.07.24 um 21:55 schrieb Mikael Morin:

From: Mikael Morin 

Hello,

I discovered this while testing the inline MINLOC/MAXLOC (aka PR90608) 
patches.

Regression tested on x86_64-linux.
OK for master?


this is a nice finding!  (NAG seems to fail on the cases with
array size 0, while Intel gets it right.)

The commit message promises to cover all variations ("with/out NANs"?)
but I fail to see these.  Were these removed in the submission?

No, it's actually type with NAN vs type without NAN; the code generated 
is different between integral types (which don't have NANs) and floating 
point types (which may have NANs).


I'll rephrase to integral vs floating-point types.


Otherwise the patch looks pretty simple and is OK for mainline.
But do not forget to s/MINLOCK/MINLOC/ in the summary.


Good catch, thanks.


Thanks for the patch!


Thanks for the review.


Harald


-- 8< --

Move the evaluation of the BACK argument out of the loop in the inline 
code

generated for MINLOC or MAXLOC.  For that, add a new (scalar) element
associated with BACK to the scalarization loop chain, evaluate the 
argument

with the context of that element, and let the scalarizer do its job.

The problem was not only a missed optimisation, but also a wrong code
one in the cases where the expression associated with BACK is not free of
side-effects, making multiple evaluations observable.

The new tests check the evaluation count of the BACK argument, and try to
cover all the variations (with/out NANs, constant or unknown shape, 
absent

or scalar or array MASK) supported by the inline implementation of the
functions.  Care has been taken to not check the case of a constant 
.FALSE.

MASK, for which the evaluation of BACK can be elided.

gcc/fortran/ChangeLog:

* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Create a new
scalar scalarization chain element if BACK is present.  Add it to
the loop.  Set the scalarization chain before evaluating the
argument.

gcc/testsuite/ChangeLog:

* gfortran.dg/maxloc_5.f90: New test.
* gfortran.dg/minloc_5.f90: New test.
---
  gcc/fortran/trans-intrinsic.cc |  10 +
  gcc/testsuite/gfortran.dg/maxloc_5.f90 | 257 +
  gcc/testsuite/gfortran.dg/minloc_5.f90 | 257 +
  3 files changed, 524 insertions(+)
  create mode 100644 gcc/testsuite/gfortran.dg/maxloc_5.f90
  create mode 100644 gcc/testsuite/gfortran.dg/minloc_5.f90

diff --git a/gcc/fortran/trans-intrinsic.cc 
b/gcc/fortran/trans-intrinsic.cc

index 5ea10e84060..cadbd177452 100644
--- a/gcc/fortran/trans-intrinsic.cc
+++ b/gcc/fortran/trans-intrinsic.cc
@@ -5325,6 +5325,7 @@ gfc_conv_intrinsic_minmaxloc (gfc_se * se, 
gfc_expr * expr, enum tree_code op)

    gfc_actual_arglist *actual;
    gfc_ss *arrayss;
    gfc_ss *maskss;
+  gfc_ss *backss;
    gfc_se arrayse;
    gfc_se maskse;
    gfc_expr *arrayexpr;
@@ -5390,6 +5391,11 @@ gfc_conv_intrinsic_minmaxloc (gfc_se * se, 
gfc_expr * expr, enum tree_code op)

  && maskexpr->symtree->n.sym->attr.dummy
  && maskexpr->symtree->n.sym->attr.optional;
    backexpr = actual->next->next->expr;
+  if (backexpr)
+    backss = gfc_get_scalar_ss (gfc_ss_terminator, backexpr);
+  else
+    backss = nullptr;
+
    nonempty = NULL;
    if (maskexpr && maskexpr->rank != 0)
  {
@@ -5449,6 +5455,9 @@ gfc_conv_intrinsic_minmaxloc (gfc_se * se, 
gfc_expr * expr, enum tree_code op)

    if (maskss)
  gfc_add_ss_to_loop (&loop, maskss);

+  if (backss)
+    gfc_add_ss_to_loop (&loop, backss);
+
    gfc_add_ss_to_loop (&loop, arrayss);

    /* Initialize the loop.  */
@@ -5535,6 +5544,7 @@ gfc_conv_intrinsic_minmaxloc (gfc_se * se, 
gfc_expr * expr, enum tree_code op)

    gfc_add_block_to_block (&block, &arrayse.pre);

    gfc_init_se (&backse, NULL);
+  backse.ss = backss;
    gfc_conv_expr_val (&backse, backexpr);
    gfc_add_block_to_block (&block, &backse.pre);

diff --git a/gcc/testsuite/gfortran.dg/maxloc_5.f90 
b/gcc/testsuite/gfortran.dg/maxloc_5.f90

new file mode 100644
index 000..5d722450c8f
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/maxloc_5.f90
@@ -0,0 +1,257 @@
+! { dg-do run }
+!
+! Check that the evaluation of MAXLOC's BACK argument is made only once
+! before the scalarisation loops.
+
+program p
+  implicit none
+  integer, parameter :: data10(*) = (/ 7, 4, 7, 6, 6, 4, 6, 3, 9, 8 /)
+  logical, parameter :: mask10(*) = (/ .false., .true., .false., &
+   .false., .true., .true.,  &
+   .true. , .true., .false., &
+   .false. /)
+  integer :: calls_count = 0
+  call check_int_const_shape
+  call check_int_const_shape_scalar_mask
+  call check_int_const_shape_array_mask
+  call check_int_const_shape_optional_mask_present
+  call check_int_const_shape_optional_mask_absent
+  call check_int_const_shape_empty
+  call check_int

RE: Lower zeroing array assignment to memset for allocatable arrays

2024-07-12 Thread Prathamesh Kulkarni


> -Original Message-
> From: Harald Anlauf 
> Sent: Friday, July 12, 2024 1:52 AM
> To: Prathamesh Kulkarni ; gcc-
> patc...@gcc.gnu.org; fortran@gcc.gnu.org
> Subject: Re: Lower zeroing array assignment to memset for allocatable
> arrays
> 
> External email: Use caution opening links or attachments
> 
> 
> Hi Prathamesh!
Hi Harald,
> 
> Am 11.07.24 um 12:16 schrieb Prathamesh Kulkarni:
> >
> >
> >> -Original Message-
> >> From: Harald Anlauf 
> >> Sent: Thursday, July 11, 2024 12:53 AM
> >> To: Prathamesh Kulkarni ; gcc-
> >> patc...@gcc.gnu.org; fortran@gcc.gnu.org
> >> Subject: Re: Lower zeroing array assignment to memset for
> allocatable
> >> arrays
> >>
> >> External email: Use caution opening links or attachments
> >>
> >>
> >> Hi Prathamesh,
> >>
> >> Am 10.07.24 um 13:22 schrieb Prathamesh Kulkarni:
> >>> Hi,
> >>> The attached patch lowers zeroing array assignment to memset for
> >> allocatable arrays.
> >>>
> >>> For example:
> >>> subroutine test(z, n)
> >>>   implicit none
> >>>   integer :: n
> >>>   real(4), allocatable :: z(:,:,:)
> >>>
> >>>   allocate(z(n, 8192, 2048))
> >>>   z = 0
> >>> end subroutine
> >>>
> >>> results in following call to memset instead of 3 nested loops for
> z
> >> = 0:
> >>>   (void) __builtin_memset ((void *) z->data, 0, (unsigned
> long)
> >>> MAX_EXPR dim[0].ubound - z->dim[0].lbound, -1> + 1) *
> >>> (MAX_EXPR dim[1].ubound - z->dim[1].lbound, -1> + 1)) *
> >> (MAX_EXPR
> >>> dim[2].ubound - z->dim[2].lbound, -1> + 1)) * 4));
> >>>
> >>> The patch significantly improves speedup for an internal Fortran
> >> application on AArch64 -mcpu=grace (and potentially on other
> AArch64
> >> cores too).
> >>> Bootstrapped+tested on aarch64-linux-gnu.
> >>> Does the patch look OK to commit ?
> >>
> >> no, it is NOT ok.
> >>
> >> Consider:
> >>
> >> subroutine test0 (n, z)
> >> implicit none
> >> integer :: n
> >> real, pointer :: z(:,:,:) ! need not be contiguous!
> >> z = 0
> >> end subroutine
> >>
> >> After your patch this also generates a memset, but this cannot be
> >> true in general.  One would need to have a test on contiguity of
> the
> >> array before memset can be used.
> >>
> >> In principle this is a nice idea, and IIRC there exists a very old
> PR
> >> on this (by Thomas König?).  So it might be worth pursuing.
> > Hi Harald,
> > Thanks for the suggestions!
> > The attached patch checks gfc_is_simply_contiguous(expr, true,
> false)
> > before lowering to memset, which avoids generating memset for your
> example above.
> 
> This is much better, as it avoids generating false memsets where it
> should not.  However, you now miss cases where the array is a
> component reference, as in:
> 
> subroutine test_dt (dt)
>implicit none
>type t
>   real, allocatable :: x(:,:,:) ! contiguous!
>   real, pointer, contiguous :: y(:,:,:) ! contiguous!
>   real, pointer :: z(:,:,:) ! need not be
> contiguous!
>end type t
>type(t) :: dt
>dt% x = 0  ! memset possible!
>dt% y = 0  ! memset possible!
>dt% z = 0  ! memset NOT possible!
> end subroutine
> 
> You'll need to cycle through the component references and apply the
> check for contiguity to the ultimate component, not the top level.
> 
> Can you have another look?
Thanks for the review!
It seems that component references are not currently handled even for static 
size arrays ?
For eg:
subroutine test_dt (dt, y)
   implicit none
   real :: y (10, 20, 30)
   type t
  real :: x(10, 20, 30)
   end type t
   type(t) :: dt
   y = 0
   dt% x = 0
end subroutine

With trunk, it generates memset for 'y' but not for dt%x.
That happens because copyable_array_p returns false for dt%x,
because expr->ref->next is non NULL:

  /* First check it's an array.  */
  if (expr->rank < 1 || !expr->ref || expr->ref->next)
return false;

and gfc_full_array_ref_p(expr) bails out if expr->ref->type != REF_ARRAY.
Looking thru git history, it seems both the checks were added in 18eaa2c0cd20 
to fix PR33370.
(Even after removing these checks, the previous patch bails out from 
gfc_trans_zero_assign because
GFC_DESCRIPTOR_TYPE_P (type) returns false for component ref and ends up 
returning NULL_TREE)
I am working on extending the patch to handle component refs for statically 
sized as well as allocatable arrays.

Since it looks like a bigger change and an extension to current functionality, 
will it be OK to commit the previous patch as-is (if it looks correct)
and address component refs in follow up one ?

Thanks,
Prathamesh  
 
> 
> Thanks,
> Harald
> 
> > Bootstrapped+tested on aarch64-linux-gnu.
> > Does the attached patch look OK ?
> >
> > Signed-off-by: Prathamesh Kulkarni 
> >
> > Thanks,
> > Prathamesh
> >>
> >> Thanks,
> >> Harald
> >>
> >>
> >>> Signed-off-by: Prathamesh Kulkarni 
> >>>
> >>> Thanks,
> >>> Prathamesh
> >



Re: Lower zeroing array assignment to memset for allocatable arrays

2024-07-12 Thread Harald Anlauf

Hi Prathamesh,

Am 12.07.24 um 15:31 schrieb Prathamesh Kulkarni:

It seems that component references are not currently handled even for static 
size arrays ?
For eg:
subroutine test_dt (dt, y)
implicit none
real :: y (10, 20, 30)
type t
   real :: x(10, 20, 30)
end type t
type(t) :: dt
y = 0
dt% x = 0
end subroutine

With trunk, it generates memset for 'y' but not for dt%x.
That happens because copyable_array_p returns false for dt%x,
because expr->ref->next is non NULL:

   /* First check it's an array.  */
   if (expr->rank < 1 || !expr->ref || expr->ref->next)
 return false;

and gfc_full_array_ref_p(expr) bails out if expr->ref->type != REF_ARRAY.


Indeed that check (as is) prevents the use of component refs.
(I just tried to modify the this part to cycle thru the refs,
but then I get regressions in the testsuite for some of the
coarray tests.  Furthermore, gfc_trans_zero_assign would
need further changes to handle even the constant shapes
from above.)


Looking thru git history, it seems both the checks were added in 18eaa2c0cd20 
to fix PR33370.
(Even after removing these checks, the previous patch bails out from 
gfc_trans_zero_assign because
GFC_DESCRIPTOR_TYPE_P (type) returns false for component ref and ends up 
returning NULL_TREE)
I am working on extending the patch to handle component refs for statically 
sized as well as allocatable arrays.

Since it looks like a bigger change and an extension to current functionality, 
will it be OK to commit the previous patch as-is (if it looks correct)
and address component refs in follow up one ?


I agree that it is reasonable to defer the handling of arrays as
components of derived types, and recommend to do the following:

- replace "&& gfc_is_simply_contiguous (expr, true, false))" in your
  last patch by "&& gfc_is_simply_contiguous (expr, false, false))",
  as that would also allow to treat

  z(:,::1,:) = 0

  as contiguous if z is allocatable or a contiguous pointer.

- open a PR in bugzilla to track the missed-optimization for
  the cases we discussed here, and link the discussion in the ML.

Your patch then will be OK for mainline.

Thanks,
Harald


Thanks,
Prathamesh


Thanks,
Harald


Bootstrapped+tested on aarch64-linux-gnu.
Does the attached patch look OK ?

Signed-off-by: Prathamesh Kulkarni 

Thanks,
Prathamesh


Thanks,
Harald



Signed-off-by: Prathamesh Kulkarni 

Thanks,
Prathamesh