https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101053

            Bug ID: 101053
           Summary: Incorrect code at -O1 on arm64
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: fortran
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gilles.gouaillardet at gmail dot com
  Target Milestone: ---

Created attachment 51003
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51003&action=edit
A simple reproducer

This issue was initially reported at
https://github.com/numpy/numpy/issues/18422

Bottom line, since the gcc-9 series(!), gfortran generates incorrect code for
OpenBLAS from -O1 on arm64.

Here is how to reproduce the issue:

# set the local prefix (to be customized)
prefix=...

# Download OpenBLAS
wget
https://github.com/xianyi/OpenBLAS/releases/download/v0.3.15/OpenBLAS-0.3.15.tar.gz

# Build and install OpenBLAS
tar xfz OpenBLAS-0.3.15.tar.gz
cd OpenBLAS-0.3.15/
make -j 56 libs netlib shared  BINARY='64'  CC='gcc'  FC='gfortran' 
MAKE_NB_JOBS='-1'  USE_OPENMP='1'  USE_THREAD='1' COMMON_OPT="-g -O1"
make install PREFIX=$prefix
cd ..

# Build and execute the attached reproducer
gfortran dgehd2.f90 -o dgehd2 -L$prefix/lib -Wl,-rpath,$prefix/lib -lopenblas
./dgehd2

Expected result (obtained with gfortran 8.3.1 (from rhel8) and 8.5.0, or if
OpenBLAS is built with COMMON_OPT="-g -O0":
 INFO =            0
   1.0000000000000000       -8.0622577482985491       0.58032253547122137      
-3.5970073030870449        11.461538461538458       -3.6923076923076938     
-0.24806946917841688        4.3076923076923075        2.5384615384615383     

Current result (from gfortran 9.1.0 up to the gcc-12-20210606 snapshot):
 INFO =            0
   1.0000000000000000       -8.0622577482985491       0.58032253547122137      
               -Infinity                       NaN                       NaN   
             -Infinity                       NaN                       NaN


The faulty code is in the dgehd2 subroutine:
      PARAMETER          ( ONE = 1.0D+0 )

      DO 10 I = ILO, IHI - 1
         CALL DLARFG( IHI-I, A( I+1, I ), A( MIN( I+2, N ), I ), 1,
     $                TAU( I ) )
         AII = A( I+1, I )
         A( I+1, I ) = ONE
         CALL DLARF( 'Right', IHI, IHI-I, A( I+1, I ), 1, TAU( I ),
     $               A( 1, I+1 ), LDA, WORK )
         CALL DLARF( 'Left', IHI-I, N-I, A( I+1, I ), 1, TAU( I ),
     $               A( I+1, I+1 ), LDA, WORK )
         A( I+1, I ) = AII
   10 CONTINUE

At the following line
         A( I+1, I ) = ONE


Here is a snippet of the assembly (generated with gfortran 10.3.0)
.LBE9:  
        .loc 1 206 72 view .LVU34
        fmov    d9, 1.0e+0
.LBB10: 
        .loc 1 211 72 view .LVU35
        adrp    x0, .LC1
        add     x0, x0, :lo12:.LC1
        str     x0, [sp, 192]
.LBE10:
.LBB11: 
        .loc 1 216 72 view .LVU36
        adrp    x0, .LC2
        add     x0, x0, :lo12:.LC2
        str     x0, [sp, 200]
.LVL20:
.L7:
        .loc 1 216 72 is_stmt 0 view .LVU37
.LBE11:
.LBB12: 
        .loc 1 204 72 is_stmt 1 view .LVU38
        ldr     w0, [x22]
        sub     w0, w0, w20
        str     w0, [sp, 224]
        add     w0, w20, 2
        ldr     w2, [x26]
        cmp     w2, w0
        csel    w2, w2, w0, le
        mov     w24, w20
        add     w20, w20, 1
.LVL21: 
        .loc 1 204 72 is_stmt 0 view .LVU39
        add     x2, x23, x2, sxtw
        mov     x4, x21
        mov     x3, x25
        ldr     x0, [sp, 136]
        add     x2, x0, x2, lsl 3
        mov     x1, x19
        ldr     x0, [sp, 184]
        bl      dlarfg_
.LVL22:
.LBE12: 
        .loc 1 205 72 is_stmt 1 view .LVU40
        ldr     d8, [x19]
.LVL23: 
        .loc 1 206 72 view .LVU41
        str     d9, [x19]


The constant 1.0D+0 is stored in $d9, but this register is used **after** the
invocation of the dlarfg_ subroutine, and it turns out this subroutine does
modify the $d9 register.
When $d9 is used to be stored into [x19], its value is
(gdb) p $d9
$1 = ( f = inf, u = 9218868437227405312, s = 9218868437227405312 )

If I set a breakpoint at that instruction, and manually
(gdb) set $d9=1.0
then the program behaves as expected.


Bottom line, there is an issue from gfortran 9 on arm64 from -O1 with this:
 - Did gfortran incorrectly assume $d9 will not be modified (or at least, will
be restored) by other subroutines?
 - Did dlarfg_ forget to restore $d9?
 - Something else?

Reply via email to