On Fri, Jan 22, 2016 at 9:52 AM, Kyrill Tkachov <kyrylo.tkac...@foss.arm.com> wrote: > Hi all, > > PR target/65932 is a wrong-code bug affecting arm and has manifested itself > when compiling the Linux kernel, so it's something that we really > ought to fix. The problem stems from the fact that PROMOTE_MODE and > TARGET_PROMOTE_FUNCTION_MODE don't match up on arm. > PROMOTE_MODE also marks the promotion as unsigned, whereas the > TARGET_PROMOTE_FUNCTION_MODE doesn't. This can lead to short variables > being wrongly zero-extended instead of sign-extended. This also occurs > in PR target/67714. > > Jim Wilson tried a few approaches and from the discussion > on the PR and on the ML > (https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00814.html) > the preferred approach is to make PROMOTE_MODE and > TARGET_PROMOTE_FUNCTION_MODE > match up. Changing TARGET_PROMOTE_FUNCTION_MODE to zero-extend would be an > ABI > change so we don't want to do that. Changing PROMOTE_MODE to not > zero-extend > fixes both PR 65932 and 67714. So Jim's patch is the first patch in this > series. > > It has been posted at > https://gcc.gnu.org/ml/gcc-patches/2015-06/msg02132.html > and this series is based on top of the arm.h hunk of that patch. > > There have been some concerns about the codegen quality fallout from Jim's > patch, which is what the remaining patches in this series address: > > * 3 new arm testsuite failures: gcc.target/arm/wmul-[123].c. > wmul-1.c and wmul-2.c are cases where we no longer generate > sign-extend+multiply (and accumulate) instructions but instead generate > the normal full-width multiplies (the operands are sign-extended from > preceeding sign-extending loads). This is a regression for some targets > on which the sign-extending form is faster. Patches 2 and 3 address this. > gcc.target/arm/wmul-3.c is a test where we actually end up generating > better code and so the testcase just needs to be adjusted. > Patch 4 deals with that. > > * Sign-extending rather than zero-extending short values means we make more > use of the ldrsb and ldrsh arm instructions rather than the zero-extending > ldrb and ldrh. On Thumb1 targets ldrsb and ldrsh have more limited > addressing > modes (only REG + REG), which could hurt code size. However, the change also > means that we can now merge sequences of load-zero-extend followed by a > sign-extend > into a single load-sign-extend. > So we'd turn a (ldrh ; sxth) into an (ldrsh). > > I've built a few popular embedded benchmarks for a Cortex-M0 target (Thumb1) > and looked > at the generated assembly for -Os and -O2. I did see both effects. That is, > ldrh instructions with an immediate being turned into two instructions: > ldrh r4, [r4, #12] > -> > movs r5, #12 > ldrsh r4, [r4, r5] > > But I also observed the beneficial effect. Various register allocation > perturbations also featured in the changes > Overall, for every codebase I've looked at both -Os and -O2 the new code was > slightly smaller. Probably not enough to call this an outright win, but > definitely not a regression overall. > > As for ARM and Thumb2 states, this series doesn't have a major impact. > SPEC2006 bencharks are slightly reduced in size (but nothing to write home > about) and there are no code quality regressions. And even the regressions > caught by the testsuite in the wmul-[12].c tests don't actually manifest > in practice in SPEC, but they are fixed anyway. > > The series contains one target-independent change to CSE in patch 3 that > I'll explain there. > > The series has been bootstrapped and tested on arm, aarch64 and x86_64. > Is this ok for trunk together with Jim's arm.h hunk from > g ?
The whole series is OK provided the middle-end hunk has been approved. Thanks for working through this. Ramana > > P.S. This also fixes PR middlen-end/67295 which is a testsuite regression on > arm. > Refer to the bugzilla entry there for the analysis of how this issue affects > that PR. > > Thanks, > Kyrill >