https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69460
Bug ID: 69460 Summary: ARM Cortex M0 produces suboptimal code vs Cortex M3 Product: gcc Version: 5.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: strntydog at gmail dot com Target Milestone: --- Created attachment 37451 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37451&action=edit Test Source Tested on GCC 4.9 and 5.2, under Linux (Ubuntu 15.10/64 bit) I am using the pre-built toolchains available at https://launchpad.net/gcc-arm-embedded When compiling for a Cortex M0 target i noticed some poor code generation with regard to literal tables. I compared that code generation to code generation to Cortex M3 and it produces much better code. It became apparent that the code generated for the M3 was actually legal M0 code and so could execute unmodified on a M0 core. Accordingly, the Cortex M0 is needlessly producing suboptimal code vs the same code compiled for Cortex M3. There are six tests in the test case, all accessing memory via different patterns. ALL generate suboptimal code for Cortex M0 vs the Cortex M3 code generator, yet all code produced is legal Cortex M0 code. Example of the sub-optimal code generation: Test 6: /* Write 8 bit values to known register locations - using an array */ void test6(void) { volatile uint8_t* const r = (uint8_t*)(0x40002800U); // Register Array r[0] = 0xFF; r[1] = 0xFE; r[2] = 0xFD; r[3] = 0xFC; r[4] = 0xEE; r[8] = 0xDD; r[12] = 0xCC; } Which, at -Os for -mcpu-cortex-m0 results in: 000000ec <test6>: ec: 22ff movs r2, #255 ; 0xff ee: 4b0a ldr r3, [pc, #40] ; (118 <test6+0x2c>) f0: 701a strb r2, [r3, #0] f2: 4b0a ldr r3, [pc, #40] ; (11c <test6+0x30>) f4: 3a01 subs r2, #1 f6: 701a strb r2, [r3, #0] f8: 4b09 ldr r3, [pc, #36] ; (120 <test6+0x34>) fa: 3a01 subs r2, #1 fc: 701a strb r2, [r3, #0] fe: 4b09 ldr r3, [pc, #36] ; (124 <test6+0x38>) 100: 3a01 subs r2, #1 102: 701a strb r2, [r3, #0] 104: 4b08 ldr r3, [pc, #32] ; (128 <test6+0x3c>) 106: 3a0e subs r2, #14 108: 701a strb r2, [r3, #0] 10a: 4b08 ldr r3, [pc, #32] ; (12c <test6+0x40>) 10c: 3a11 subs r2, #17 10e: 701a strb r2, [r3, #0] 110: 4b07 ldr r3, [pc, #28] ; (130 <test6+0x44>) 112: 3a11 subs r2, #17 114: 701a strb r2, [r3, #0] 116: 4770 bx lr 118: 40002800 .word 0x40002800 11c: 40002801 .word 0x40002801 120: 40002802 .word 0x40002802 124: 40002803 .word 0x40002803 128: 40002804 .word 0x40002804 12c: 40002808 .word 0x40002808 130: 4000280c .word 0x4000280c Each element accessed in the array of bytes has resulted in the address of that element appearing in the literal table. !!!! By comparison the M3 build generates : 00000094 <test6>: 94: 4b07 ldr r3, [pc, #28] ; (b4 <test6+0x20>) 96: 22ff movs r2, #255 ; 0xff 98: 701a strb r2, [r3, #0] 9a: 22fe movs r2, #254 ; 0xfe 9c: 705a strb r2, [r3, #1] 9e: 22fd movs r2, #253 ; 0xfd a0: 709a strb r2, [r3, #2] a2: 22fc movs r2, #252 ; 0xfc a4: 70da strb r2, [r3, #3] a6: 22ee movs r2, #238 ; 0xee a8: 711a strb r2, [r3, #4] aa: 22dd movs r2, #221 ; 0xdd ac: 721a strb r2, [r3, #8] ae: 22cc movs r2, #204 ; 0xcc b0: 731a strb r2, [r3, #12] b2: 4770 bx lr b4: 40002800 .word 0x40002800 ALL of which is LEGAL M0 Code. The Cortex M0 compile is: 72 Bytes Long, 22 Instructions and 7 Literal Table Entries and 7 reads from Code Space. The Cortex M3 compile (which generates legal M0 code) is: 36 Bytes Long, 16 Instructions and 1 Literal Table Entry and 1 read from Code Space. Significantly more efficient in every respect. Given that Cortex M0 cores usually have less resources than Cortex M3 cores, I would expect the code generation to be the same between them, unless there is an ability to use an instruction which only exists on a Cortex M3. This inefficient code generation will make Cortex M0 cores seem much less efficient and much slower than they are in reality. Attached is a the test case and a script to build it. The script builds the code for M0 and M3, it then dumps the M3 assembler, patches it so that it can be assembled as M0 assembler and assembles the result. The reason for that is to confirm that the M3 generated code is LEGAL M0 Code, which it is.