Yao Qi wrote: > Hi, > We are looking for some possible improvements and optimizations on > thumb2 code size. Currently, I am running some benchmarks with > compilation flag "-Os -march=armv7-a -mthumb", and hope to find some > thing interesting that we can improve. Beside that, do you have some > ideas on this topic? or do you have some observations on thumb2 code > that we may probably improve the size? > > Any thoughts on this are appreciated.
I found some new possible improvements. Your comments on them are welcome. See more details in https://wiki.linaro.org/YaoQi/Sandbox/Thumb2SizeOptimize 10. Replace multiple vldr by vldm Observed in bezier01float/bez.o, 8: f100 0438 add.w r4, r0, #56 ; 0x38 c: b085 sub sp, #20 e: 2600 movs r6, #0 10: e03d b.n 8e <interpolatePoints+0x8e> 12: e954 2302 ldrd r2, r3, [r4, #-8] 16: 2500 movs r5, #0 18: ed14 ab0e vldr d10, [r4, #-56] ; 0xffffffc8 // <-- 1c: ed14 bb0c vldr d11, [r4, #-48] ; 0xffffffd0 // <-- 20: ed14 cb0a vldr d12, [r4, #-40] ; 0xffffffd8 // <-- 24: ed14 db08 vldr d13, [r4, #-32] ; 0xffffffe0 // <-- 28: e9cd 2300 strd r2, r3, [sp] 2c: ed14 eb06 vldr d14, [r4, #-24] ; 0xffffffe8 // <-- These vldr instructions can be replaced by one vldm. 11. Replace str/ldr by memcpy Observed in bezier01fixed/pointio.o:outputPoints() 00000000 <outputPoints>: 0: e92d 4ff0 stmdb sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr} 4: 4604 mov r4, r0 6: b089 sub sp, #36 ; 0x24 8: 2600 movs r6, #0 a: 460f mov r7, r1 c: e025 b.n 5a <outputPoints+0x5a> e: 68e3 ldr r3, [r4, #12] 10: 2500 movs r5, #0 12: e894 0e00 ldmia.w r4, {r9, sl, fp} 16: 9303 str r3, [sp, #12] 18: 6923 ldr r3, [r4, #16] 1a: 9304 str r3, [sp, #16] 1c: 6963 ldr r3, [r4, #20] 1e: 9305 str r3, [sp, #20] 20: 69a3 ldr r3, [r4, #24] 22: 9306 str r3, [sp, #24] 24: 69e3 ldr r3, [r4, #28] 26: 9307 str r3, [sp, #28] code size will be smaller if we replace ldr/str by memcpy(). 12. uxth/sxth Observed in automotive/idctrn01/bmark.c short unPack( unsigned char c ) { /* Only want lower four bit nibble */ c = c & (unsigned char)0x0F ; if( c > 7 ) { /* Negative nibble */ return( ( short )( c - 16 ) ) ; } else { /* positive nibble */ return( ( short )c ) ; } } GCC produces code like this, 00000024 <unPack>: 24: f000 000f and.w r0, r0, #15 28: 2807 cmp r0, #7 2a: d901 bls.n 30 <unPack+0xc> 2c: 3810 subs r0, #16 2e: b280 uxth r0, r0 <--[1] 30: b200 sxth r0, r0 <--[2] 32: 4770 bx lr Are instruction [1] and [2] redundant? Can we remove these two instructions? If they are redundant, we can remove them safely. _______________________________________________ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain