https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89112

            Bug ID: 89112
           Summary: Incorrect code generated by rs6000 memcmp expansion
           Product: gcc
           Version: 8.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: samuel at sholland dot org
  Target Milestone: ---

Created attachment 45562
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45562&action=edit
Slightly simplified reproducer -- derived from Net::SSLeay constants.c

In a function with a large number of calls to memcmp(), incorrect code is
generated for some comparisons larger than 32 bytes. This incorrect code causes
the comparison to erroneously fail or to crash. It appears that part of the
generated comparison loop is merged between different calls to memcmp() -- I
don't know if this is part of the problem.

I've simplified the code a bit and attached the `reproducer.i` and `log`
generated with the following command:

    gcc -v -save-temps reproducer.c -O2 &>log

Here is the relevant assembly for the length == 33 switch case:

<constant+1040>:      li      r9,2
<constant+1044>:      addis   r6,r2,-2
<constant+1048>:      li      r10,0
<constant+1052>:      li      r8,8
<constant+1056>:      mtctr   r9
<constant+1060>:      addi    r6,r6,-12152
<constant+1064>:      ldx     r9,r3,r10
<constant+1068>:      ldx     r7,r6,r10
<constant+1072>:      ldx     r4,r3,r8
<constant+1076>:      ldx     r5,r6,r8
<constant+1080>:      addi    r10,r10,16
<constant+1084>:      addi    r8,r8,16
<constant+1088>:      cmpld   cr7,r9,r7
<constant+1092>:      bne     cr7,0x3ffff7c6760c <constant+1100>
<constant+1096>:      b       0x3ffff7c72110 <constant+44880>
<constant+1100>:      setb    r9,cr7
<constant+1104>:      cmpwi   cr7,r9,0
<constant+1108>:      bne     cr7,0x3ffff7c6761c <constant+1116>
<constant+1112>:      b       0x3ffff7c72104 <constant+44868>
<constant+1116>:      mflr    r0
<constant+1120>:      std     r0,48(r1)
<constant+1124>:      b       0x3ffff7c67300 <constant+320>
...
<constant+44880>:     cmpld   cr7,r4,r5
<constant+44884>:     bdz     0x3ffff7c72120 <constant+44896>
<constant+44888>:     beq     cr7,0x3ffff7c72120 <constant+44896>
<constant+44892>:     b       0x3ffff7c72124 <constant+44900>
<constant+44896>:     b       0x3ffff7c675e8 <constant+1064>
<constant+44900>:     beq     cr7,0x3ffff7c7212c <constant+44908>
<constant+44904>:     b       0x3ffff7c6760c <constant+1100>
<constant+44908>:     add     r3,r3,r10
<constant+44912>:     add     r10,r6,r10
<constant+44916>:     addi    r9,r3,-7
<constant+44920>:     addi    r10,r10,-7
<constant+44924>:     ld      r9,0(r9)
<constant+44928>:     ld      r10,0(r10)
<constant+44932>:     cmpld   cr7,r9,r10
<constant+44936>:     b       0x3ffff7c6760c <constant+1100>

Note that in the case where the bdz is taken, because we've reached the end of
the 16-byte comparison loop (here, after comparing 32 of the 33 bytes), it goes
back and compares another 16 bytes!

If I compile this file with -mblock-compare-inline-loop-limit=32 or some lower
value, I get the expected results (the comparisons all succeed).

If I compile the file with -O1 or lower, it also succeeds. I assume this is
because the memcmp expansion is not enabled at those optimization levels.

This was noticed as 8 failures in the NET:SSLeay test suite, files
t/local/20_autoload.t and t/local/21_constants.t (two of the failures being
related to the OCSP_RESPONSE_* constants in the simplified code attached). The
test suite passes with -mblock-compare-inline-loop-limit=32 or lower.

Reply via email to