[Bug fortran/118259] -O3 optimisation bug fixed with -fno-inline

2024-12-31 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118259 --- Comment #4 from mjr19 at cam dot ac.uk --- Add using seed=iand(seed*int(1103515245,selected_int_kind(18))+12345,z'7fff') also works as expected. Converting the code to C shows the same behaviour as the Fortran if seed is a

[Bug fortran/118259] -O3 optimisation bug fixed with -fno-inline

2024-12-31 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118259 --- Comment #3 from mjr19 at cam dot ac.uk --- That is a very interesting point. If I change the constants in the random number generator to seed=iand(seed*110+123,z'7fff') then the answer with '-O3' is

[Bug fortran/118259] New: -O3 optimisation bug fixed with -fno-inline

2024-12-31 Thread mjr19 at cam dot ac.uk via Gcc-bugs
: fortran Assignee: unassigned at gcc dot gnu.org Reporter: mjr19 at cam dot ac.uk Target Milestone: --- Created attachment 60012 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60012&action=edit Module-free test case The following code, which is a poor random

[Bug fortran/117805] complex type, -Ofast and IEEE-754

2024-11-30 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117805 --- Comment #20 from mjr19 at cam dot ac.uk --- I am not convinced that gfortran's current behaviour is wholly consistent with what a mathematician would reasonably expect. When I was taught complex arithmetic, multiplication by one and add

[Bug fortran/117805] complex type, -Ofast and IEEE-754

2024-11-28 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117805 --- Comment #13 from mjr19 at cam dot ac.uk --- (In reply to kargls from comment #11) > On 11/28/24 04:54, rguenth at gcc dot gnu.org wrote: > > The Fortran standard stops at this point and does not specify the > actual underlyi

[Bug fortran/117805] complex type, -Ofast and IEEE-754

2024-11-28 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117805 --- Comment #9 from mjr19 at cam dot ac.uk --- (In reply to kargls from comment #6) I agree that parts of the reasoning from J3 are a little surprising, but other parts seem sound, and the conclusion is unambiguous. (I also disagree with its

[Bug fortran/117805] complex type, -Ofast and IEEE-754

2024-11-27 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117805 --- Comment #5 from mjr19 at cam dot ac.uk --- Compiling with -fno-signed-zeros does work surprisingly well. I say "surprisingly", as I think that the change affects more than just signed zeros, in that 3.0*(2.0,Inf) might be (6.0,In

[Bug fortran/117805] complex type, -Ofast and IEEE-754

2024-11-27 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117805 --- Comment #2 from mjr19 at cam dot ac.uk --- There will certainly be differences in some cases. If R=2.0 and Z=-0.0i the answer might be (0.0,0.0) or (0.0,-0.0). The point is that Fortran does not specify which of these is correct. Both are

[Bug fortran/117805] New: complex type, -Ofast and IEEE-754

2024-11-27 Thread mjr19 at cam dot ac.uk via Gcc-bugs
Assignee: unassigned at gcc dot gnu.org Reporter: mjr19 at cam dot ac.uk Target Milestone: --- Gfortran avoids certain optimisations unless -Ofast is specified, as the optimisations are not compatible with IEE-754. These include optimisations on the complex type. As one example

[Bug fortran/107294] Missed optimization: multiplying real with complex number in Fortran (only)

2024-11-04 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107294 --- Comment #7 from mjr19 at cam dot ac.uk --- I was sufficiently confused having read the Standard to persuade Dr John Reid to submit a request for clarification to J3, the Fortran Standards Committee. The request is at https://j3-fortran.org

[Bug fortran/116128] missed optimisation: fortran sum instrinsic performed in order

2024-08-15 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116128 --- Comment #5 from mjr19 at cam dot ac.uk --- I think in general using partial sums improves accuracy. If one assumes that all of the data have the same sign and similar magnitude, then by the time the sum is nearly complete one is adding a

[Bug fortran/116128] missed optimisation: fortran sum instrinsic performed in order

2024-08-06 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116128 --- Comment #3 from mjr19 at cam dot ac.uk --- It seems that most of these are in-line expanded by gfortran-14.1, at least in some cases. function foo(a,n) real(kind(1d0))::a(*),foo integer::n foo=sum(a(1:n)) end function foo and

[Bug tree-optimization/114767] gfortran AVX2 complex multiplication by (0d0,1d0) suboptimal

2024-08-01 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114767 --- Comment #8 from mjr19 at cam dot ac.uk --- If it is tricky to teach gfortran that it can flip the signs of alternate elements in a vector trivially with an xor, would a possible step to an improvement be to teach it that the cost of vpermpd

[Bug fortran/116128] missed optimisation: fortran sum instrinsic performed in order

2024-07-31 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116128 --- Comment #1 from mjr19 at cam dot ac.uk --- The same comment applies to maxval and minval, which vectorise with -Ofast only for -mavx2, although the answer will be independent of the ordering of the scalar min/max operations. In contrast

[Bug tree-optimization/116109] Missed optimisation: unnecessary register dependency on reduction

2024-07-30 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116109 --- Comment #3 from mjr19 at cam dot ac.uk --- It might be helpful if GCC considered this optimisation separately from unrolling. Traditional unrolling attempts to reduce the overhead of the (integer) loop control instructions, but with

[Bug fortran/116128] New: missed optimisation: fortran sum instrinsic performed in order

2024-07-29 Thread mjr19 at cam dot ac.uk via Gcc-bugs
Priority: P3 Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: mjr19 at cam dot ac.uk Target Milestone: --- gfortran-14 performs the Fortran sum intrinsic strictly in order, thus preventing any vectorisation and imposing a data dependency

[Bug tree-optimization/116109] New: Missed optimisation: unnecessary register dependency on reduction

2024-07-26 Thread mjr19 at cam dot ac.uk via Gcc-bugs
Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: mjr19 at cam dot ac.uk Target Milestone: --- With a loop such as function mod2(x,n) real(kind(1d0))::mod2,x(*),t integer::i,n t=0 !$omp simd reduction(+:t) do

[Bug tree-optimization/115709] missed optimisation: vperms not reordered to eliminate

2024-07-02 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115709 --- Comment #3 from mjr19 at cam dot ac.uk --- Created attachment 58558 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58558&action=edit Demo of effect of vperm rearrangement I still believe that my code is correct. To make what I

[Bug fortran/115711] New: Fortran: extra malloc and copy with transfer

2024-06-29 Thread mjr19 at cam dot ac.uk via Gcc-bugs
: fortran Assignee: unassigned at gcc dot gnu.org Reporter: mjr19 at cam dot ac.uk Target Milestone: --- subroutine foo(a,b) complex(kind(1d0))::a real(kind(1d0))::b(2) b=transfer(a,b) end subroutine foo produces optimal code. But change the declarations to complex

[Bug fortran/115710] New: fortran complex abs does not vectorise

2024-06-29 Thread mjr19 at cam dot ac.uk via Gcc-bugs
: fortran Assignee: unassigned at gcc dot gnu.org Reporter: mjr19 at cam dot ac.uk Target Milestone: --- subroutine foo(a,b,n) complex(kind(1d0))::a(*) real(kind(1d0))::b(*) integer::i,n do i=1,n b(i)=abs(a(i))**2 end do end subroutine foo fails to vectorise with

[Bug tree-optimization/115709] New: missed optimisation: vperms not reordered to eliminate

2024-06-29 Thread mjr19 at cam dot ac.uk via Gcc-bugs
Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: mjr19 at cam dot ac.uk Target Milestone: --- #include void foo(double complex *a, double *b, int n){ int i; for(i=0;i

[Bug tree-optimization/114324] [13/14/15 Regression] AVX2 vectorisation performance regression with gfortran 13/14

2024-06-25 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114324 --- Comment #8 from mjr19 at cam dot ac.uk --- Ooops -- timings not ns/iteration as claimed, nor even comparable between the m3spf and m4spf examples, but they are consistent within each example.

[Bug tree-optimization/114324] [13/14/15 Regression] AVX2 vectorisation performance regression with gfortran 13/14

2024-06-25 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114324 --- Comment #7 from mjr19 at cam dot ac.uk --- The patch to GCC 15 in commit r15-1508-g59221dc587f369695d9b0c2f73aedf8458931f0f from pr 68855 has made a significant improvement to the optimisation of these examples at -O3, causing the -Ofast

[Bug fortran/115563] Unnecessary brackets prevent fortran vectorisation

2024-06-24 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115563 --- Comment #6 from mjr19 at cam dot ac.uk --- A further comment to aid others reading this report. It is not just unnecessary brackets which used to prevent vectorisation, but also necessary ones. subroutine foo(a,b,c,n) complex (kind(1d0

[Bug fortran/115563] Unnecessary brackets prevent fortran vectorisation

2024-06-21 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115563 --- Comment #5 from mjr19 at cam dot ac.uk --- I'm glad this was useful, and thanks for the impressively rapid fix. I stumbled across this by chance whilst trying to construct a minimal example for a rather different missed vectorisation case.

[Bug fortran/115563] New: Unnecessary brackets prevent fortran vectorisation

2024-06-20 Thread mjr19 at cam dot ac.uk via Gcc-bugs
Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: mjr19 at cam dot ac.uk Target Milestone: --- The code subroutine foo(a,n) complex (kind(1d0)) :: a(*) integer :: i,n !$OMP SIMD do i=1,n a(i)=(a(i)+(6d0,1d0)) enddo end subroutine foo compiled

[Bug fortran/107294] Missed optimization: multiplying real with complex number in Fortran (only)

2024-06-17 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107294 mjr19 at cam dot ac.uk changed: What|Removed |Added CC||mjr19 at cam dot ac.uk

[Bug tree-optimization/114767] gfortran AVX2 complex multiplication by (0d0,1d0) suboptimal

2024-05-14 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114767 --- Comment #7 from mjr19 at cam dot ac.uk --- Another manifestation of this issue in GCC 13.1 and 14.1 is that the loop do i=1,n c(i)=a(i)*c(i)*(0d0,1d0) enddo takes about twice as long to run as do i=1,n c(i)=a(i)*(0d0,1d0

[Bug tree-optimization/114324] [13/14/15 Regression] AVX2 vectorisation performance regression with gfortran 13/14

2024-05-01 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114324 --- Comment #5 from mjr19 at cam dot ac.uk --- Note that bug 114767 also turns out to be a case in which the inability to alternate neg and nop along a vector leads to poor performance with some operations on the complex type. That optimisation

[Bug tree-optimization/114767] gfortran AVX2 complex multiplication by (0d0,1d0) suboptimal

2024-04-19 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114767 --- Comment #6 from mjr19 at cam dot ac.uk --- I was starting to wonder whether this issue might be related to that in bug 114324, which is a slightly more complicated example in which multiplication by a purely imaginary number destroys

[Bug tree-optimization/114767] gfortran AVX2 complex multiplication by (0d0,1d0) suboptimal

2024-04-18 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114767 --- Comment #4 from mjr19 at cam dot ac.uk --- An issue which I suspect is related is shown by subroutine zradd(c,n) integer :: i,n complex(kind(1d0)) :: c(*) do i=1,n c(i)=c(i)+1d0 enddo end subroutine If compiled with gfortran

[Bug tree-optimization/114767] gfortran AVX2 complex multiplication by (0d0,1d0) suboptimal

2024-04-18 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114767 --- Comment #2 from mjr19 at cam dot ac.uk --- Ah, I see. An inability to alternate negation with noop also means that conjugation is treated suboptimally. do i=1,n c(i)=conjg(c(i)) enddo Here gfortran-13 and -14 are differently

[Bug fortran/114767] New: gfortran AVX2 complex multiplication by (0d0,1d0) suboptimal

2024-04-18 Thread mjr19 at cam dot ac.uk via Gcc-bugs
Priority: P3 Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: mjr19 at cam dot ac.uk Target Milestone: --- Gfortran 14 shows considerable improvement over 13.1 on x86_64 AVX2 on the test case subroutine scale_i(c,n) integer :: i,n complex

[Bug tree-optimization/114324] [13/14 Regression] AVX2 vectorisation performance regression with gfortran 13/14

2024-03-15 Thread mjr19 at cam dot ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114324 --- Comment #4 from mjr19 at cam dot ac.uk --- Created attachment 57713 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57713&action=edit Second testcase, very similar to first Thank you for looking into this. The real code in quest

[Bug fortran/114324] New: AVX2 vectorisation performance regression with gfortran 13/14

2024-03-13 Thread mjr19 at cam dot ac.uk via Gcc-bugs
Priority: P3 Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: mjr19 at cam dot ac.uk Target Milestone: --- Created attachment 57685 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57685&action=edit Test case of loop showing perf

[Bug fortran/92698] Unnecessary copy in overlapping array assignment

2019-11-30 Thread mjr19 at cam dot ac.uk
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92698 --- Comment #2 from mjr19 at cam dot ac.uk --- Thomas is quite correct that I had failed to mark the array as contiguous, at which point the double copy is more reasonable (although memcpy will also expect its arguments to be contiguous). He

[Bug fortran/92698] New: Unnecessary copy in overlapping array assignment

2019-11-27 Thread mjr19 at cam dot ac.uk
: fortran Assignee: unassigned at gcc dot gnu.org Reporter: mjr19 at cam dot ac.uk Target Milestone: --- subroutine cpy(a,src,dest,len) integer, intent(in) :: src,dest,len real(kind(1d0)), intent(inout) :: a(:) a(dest:dest+len-1)=a(src:src+len-1) end subroutine cpy