https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82683
Bug ID: 82683 Summary: GCC generates bad code with -tune=thunderx2t99 Product: gcc Version: unknown Status: UNCONFIRMED Keywords: wrong-code Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: sje at gcc dot gnu.org Target Milestone: --- Created attachment 42448 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42448&action=edit Test case I am compiling the GCC spec 2017 benchmark on aarch64. If I compile it with -tune=thunderxt88 it works and if I compile with -tune=thunderx2t99 it fails. The tune option should affect the speed of a program on different architectures but it should never result in bad code. I have attached a cutdown testcase (compilable but not runnable) to show the problem. In the good case you should see two sxtw sign extend instructions: sxtw x20, w0 cbz x1, .L2 ldr w0, [x1, x20, lsl 2] sxtw x20, w0 // 21 .L2: In the bad case we only get one: sxtw x20, w0 cbz x1, .L2 ldr w0, [x1, x20, lsl 2] .L2 If I insert the missing sxtw by hand everything works fine for me. The sxtw seems to go missing during combine but I do not know why. Notice that in addition to not doing the sxtw, we leave the loaded value in w0 and do not put it in x20 like the good code does. In addition to the -tune argument I am compiling with: -std=c11 -O2 -fno-inline -fno-schedule-insns -fno-schedule-insns2 -fno-strict-aliasing