Hi Prathamesh, Am 10.07.24 um 13:22 schrieb Prathamesh Kulkarni:
Hi, The attached patch lowers zeroing array assignment to memset for allocatable arrays.For example: subroutine test(z, n) implicit none integer :: n real(4), allocatable :: z(:,:,:) allocate(z(n, 8192, 2048)) z = 0 end subroutine results in following call to memset instead of 3 nested loops for z = 0: (void) __builtin_memset ((void *) z->data, 0, (unsigned long) ((((MAX_EXPR <z->dim[0].ubound - z->dim[0].lbound, -1> + 1) * (MAX_EXPR <z->dim[1].ubound - z->dim[1].lbound, -1> + 1)) * (MAX_EXPR <z->dim[2].ubound - z->dim[2].lbound, -1> + 1)) * 4)); The patch significantly improves speedup for an internal Fortran application on AArch64 -mcpu=grace (and potentially on other AArch64 cores too). Bootstrapped+tested on aarch64-linux-gnu. Does the patch look OK to commit ?
no, it is NOT ok. Consider: subroutine test0 (n, z) implicit none integer :: n real, pointer :: z(:,:,:) ! need not be contiguous! z = 0 end subroutine After your patch this also generates a memset, but this cannot be true in general. One would need to have a test on contiguity of the array before memset can be used. In principle this is a nice idea, and IIRC there exists a very old PR on this (by Thomas König?). So it might be worth pursuing. Thanks, Harald
Signed-off-by: Prathamesh Kulkarni <[email protected]> Thanks, Prathamesh
