https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89112
Bug ID: 89112 Summary: Incorrect code generated by rs6000 memcmp expansion Product: gcc Version: 8.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: samuel at sholland dot org Target Milestone: --- Created attachment 45562 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45562&action=edit Slightly simplified reproducer -- derived from Net::SSLeay constants.c In a function with a large number of calls to memcmp(), incorrect code is generated for some comparisons larger than 32 bytes. This incorrect code causes the comparison to erroneously fail or to crash. It appears that part of the generated comparison loop is merged between different calls to memcmp() -- I don't know if this is part of the problem. I've simplified the code a bit and attached the `reproducer.i` and `log` generated with the following command: gcc -v -save-temps reproducer.c -O2 &>log Here is the relevant assembly for the length == 33 switch case: <constant+1040>: li r9,2 <constant+1044>: addis r6,r2,-2 <constant+1048>: li r10,0 <constant+1052>: li r8,8 <constant+1056>: mtctr r9 <constant+1060>: addi r6,r6,-12152 <constant+1064>: ldx r9,r3,r10 <constant+1068>: ldx r7,r6,r10 <constant+1072>: ldx r4,r3,r8 <constant+1076>: ldx r5,r6,r8 <constant+1080>: addi r10,r10,16 <constant+1084>: addi r8,r8,16 <constant+1088>: cmpld cr7,r9,r7 <constant+1092>: bne cr7,0x3ffff7c6760c <constant+1100> <constant+1096>: b 0x3ffff7c72110 <constant+44880> <constant+1100>: setb r9,cr7 <constant+1104>: cmpwi cr7,r9,0 <constant+1108>: bne cr7,0x3ffff7c6761c <constant+1116> <constant+1112>: b 0x3ffff7c72104 <constant+44868> <constant+1116>: mflr r0 <constant+1120>: std r0,48(r1) <constant+1124>: b 0x3ffff7c67300 <constant+320> ... <constant+44880>: cmpld cr7,r4,r5 <constant+44884>: bdz 0x3ffff7c72120 <constant+44896> <constant+44888>: beq cr7,0x3ffff7c72120 <constant+44896> <constant+44892>: b 0x3ffff7c72124 <constant+44900> <constant+44896>: b 0x3ffff7c675e8 <constant+1064> <constant+44900>: beq cr7,0x3ffff7c7212c <constant+44908> <constant+44904>: b 0x3ffff7c6760c <constant+1100> <constant+44908>: add r3,r3,r10 <constant+44912>: add r10,r6,r10 <constant+44916>: addi r9,r3,-7 <constant+44920>: addi r10,r10,-7 <constant+44924>: ld r9,0(r9) <constant+44928>: ld r10,0(r10) <constant+44932>: cmpld cr7,r9,r10 <constant+44936>: b 0x3ffff7c6760c <constant+1100> Note that in the case where the bdz is taken, because we've reached the end of the 16-byte comparison loop (here, after comparing 32 of the 33 bytes), it goes back and compares another 16 bytes! If I compile this file with -mblock-compare-inline-loop-limit=32 or some lower value, I get the expected results (the comparisons all succeed). If I compile the file with -O1 or lower, it also succeeds. I assume this is because the memcmp expansion is not enabled at those optimization levels. This was noticed as 8 failures in the NET:SSLeay test suite, files t/local/20_autoload.t and t/local/21_constants.t (two of the failures being related to the OCSP_RESPONSE_* constants in the simplified code attached). The test suite passes with -mblock-compare-inline-loop-limit=32 or lower.