https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89877

            Bug ID: 89877
           Summary: [ARC] miscompilation due to missing cc clobber in
                    longlong.h: add_ssaaaa()/sub_ddmmss()
           Product: gcc
           Version: 8.3.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vgupta at synopsys dot com
  Target Milestone: ---

Created attachment 46051
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46051&action=edit
test case, build with -O2 to show issue

A glibc build with -mcpu=hs4x sowed weird print values for test case below
(originally showed in multibench test harness printing wrong values)

void main(int argc, char *argv[])
{
        size_t total_time = 115424
        double secs = (double)total_time/(double)1000;
        printf("%s %d %lf\n", "secs", total_time, secs);  // prints 113.504
        printf("%d\n", (size_t)secs);
}

The code path leads to glibc stdlib/divrem.c: __mpn_divrem() which in turn uses
target defined inline asm macros in stdlib/longlong.h (which in turns is
sync'ed from gcc include/longlong.h)

These inline macros clobber the cpu flags, but fail to add "cc" in clobber
list.
This causes gcc to schedule a flag setting CMP instruction (or ADD.f) before
the clobbering ADD.f/SUB.f instructions, causing a subsequent conditional
branch to use a stale flag.

__mpn_divrem:
...
.L135:
...
        st    -1,[r0]
        cmp_s r10,-1            <-- intended flag
        sub   r0,r0,4
        sub   r4,r2,r9
        add.f r2, r18, r9       <-- clobbered
        adc   r3, r4, 0
        beq_s @.L72             <-- stale flag used

-mcpu=hs4x + cc clobber fix
---------------------------
        st    -1,[r0]
        sub   r4,r2,r9
        sub   r0,r0,4
        add.f r2, r18, r9
        adc   r3, r4, 0
        cmp_s r10,-1            <-- intended flag
        beq_s @.L72             <-- right flag used

The issue doesn't happen with default -mpcu=hs38 as the instruction scheduling
already delays the CMP for some reason.

-mcpu=hs38
----------
        st    -1,[r0]
        sub   r4,r2,r9
        sub   r0,r0,4
        add.f r2, r18, r9
        adc   r3, r4, 0
        cmp_s r10,-1
        beq_s @.L72

Reply via email to