https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101053
Bug ID: 101053 Summary: Incorrect code at -O1 on arm64 Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: gilles.gouaillardet at gmail dot com Target Milestone: --- Created attachment 51003 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51003&action=edit A simple reproducer This issue was initially reported at https://github.com/numpy/numpy/issues/18422 Bottom line, since the gcc-9 series(!), gfortran generates incorrect code for OpenBLAS from -O1 on arm64. Here is how to reproduce the issue: # set the local prefix (to be customized) prefix=... # Download OpenBLAS wget https://github.com/xianyi/OpenBLAS/releases/download/v0.3.15/OpenBLAS-0.3.15.tar.gz # Build and install OpenBLAS tar xfz OpenBLAS-0.3.15.tar.gz cd OpenBLAS-0.3.15/ make -j 56 libs netlib shared BINARY='64' CC='gcc' FC='gfortran' MAKE_NB_JOBS='-1' USE_OPENMP='1' USE_THREAD='1' COMMON_OPT="-g -O1" make install PREFIX=$prefix cd .. # Build and execute the attached reproducer gfortran dgehd2.f90 -o dgehd2 -L$prefix/lib -Wl,-rpath,$prefix/lib -lopenblas ./dgehd2 Expected result (obtained with gfortran 8.3.1 (from rhel8) and 8.5.0, or if OpenBLAS is built with COMMON_OPT="-g -O0": INFO = 0 1.0000000000000000 -8.0622577482985491 0.58032253547122137 -3.5970073030870449 11.461538461538458 -3.6923076923076938 -0.24806946917841688 4.3076923076923075 2.5384615384615383 Current result (from gfortran 9.1.0 up to the gcc-12-20210606 snapshot): INFO = 0 1.0000000000000000 -8.0622577482985491 0.58032253547122137 -Infinity NaN NaN -Infinity NaN NaN The faulty code is in the dgehd2 subroutine: PARAMETER ( ONE = 1.0D+0 ) DO 10 I = ILO, IHI - 1 CALL DLARFG( IHI-I, A( I+1, I ), A( MIN( I+2, N ), I ), 1, $ TAU( I ) ) AII = A( I+1, I ) A( I+1, I ) = ONE CALL DLARF( 'Right', IHI, IHI-I, A( I+1, I ), 1, TAU( I ), $ A( 1, I+1 ), LDA, WORK ) CALL DLARF( 'Left', IHI-I, N-I, A( I+1, I ), 1, TAU( I ), $ A( I+1, I+1 ), LDA, WORK ) A( I+1, I ) = AII 10 CONTINUE At the following line A( I+1, I ) = ONE Here is a snippet of the assembly (generated with gfortran 10.3.0) .LBE9: .loc 1 206 72 view .LVU34 fmov d9, 1.0e+0 .LBB10: .loc 1 211 72 view .LVU35 adrp x0, .LC1 add x0, x0, :lo12:.LC1 str x0, [sp, 192] .LBE10: .LBB11: .loc 1 216 72 view .LVU36 adrp x0, .LC2 add x0, x0, :lo12:.LC2 str x0, [sp, 200] .LVL20: .L7: .loc 1 216 72 is_stmt 0 view .LVU37 .LBE11: .LBB12: .loc 1 204 72 is_stmt 1 view .LVU38 ldr w0, [x22] sub w0, w0, w20 str w0, [sp, 224] add w0, w20, 2 ldr w2, [x26] cmp w2, w0 csel w2, w2, w0, le mov w24, w20 add w20, w20, 1 .LVL21: .loc 1 204 72 is_stmt 0 view .LVU39 add x2, x23, x2, sxtw mov x4, x21 mov x3, x25 ldr x0, [sp, 136] add x2, x0, x2, lsl 3 mov x1, x19 ldr x0, [sp, 184] bl dlarfg_ .LVL22: .LBE12: .loc 1 205 72 is_stmt 1 view .LVU40 ldr d8, [x19] .LVL23: .loc 1 206 72 view .LVU41 str d9, [x19] The constant 1.0D+0 is stored in $d9, but this register is used **after** the invocation of the dlarfg_ subroutine, and it turns out this subroutine does modify the $d9 register. When $d9 is used to be stored into [x19], its value is (gdb) p $d9 $1 = ( f = inf, u = 9218868437227405312, s = 9218868437227405312 ) If I set a breakpoint at that instruction, and manually (gdb) set $d9=1.0 then the program behaves as expected. Bottom line, there is an issue from gfortran 9 on arm64 from -O1 with this: - Did gfortran incorrectly assume $d9 will not be modified (or at least, will be restored) by other subroutines? - Did dlarfg_ forget to restore $d9? - Something else?