On 14/01/14 14:25, Richard Biener wrote: > On Tue, Jan 14, 2014 at 3:21 PM, Kyrill Tkachov <kyrylo.tkac...@arm.com> > wrote: >> Moving to gcc, I accidentally sent it to gcc-patches previously... >> >> >> On 14/01/14 14:09, Richard Biener wrote: >>> >>> On Tue, Jan 14, 2014 at 3:03 PM, Kyrill Tkachov <kyrylo.tkac...@arm.com> >>> wrote: >>>> >>>> Hi all, >>>> >>>> I'm looking into PR 54168 where we end up generating an unnecessary >>>> extend >>>> operation on arm. >>>> >>>> Given code: >>>> >>>> struct b2Body { >>>> unsigned short flags; >>>> int type; >>>> }; >>>> >>>> static _Bool IsAwake(struct b2Body *b) >>>> { >>>> return (b->flags & 2) == 2; >>>> } >>>> >>>> >>>> int foo(struct b2Body *bA, struct b2Body *bB) >>>> { >>>> int typeA = bA->type; >>>> int typeB = bB->type; >>>> _Bool activeA = IsAwake(bA) && typeA != 0; >>>> _Bool activeB = IsAwake(bB) && typeB != 0; >>>> >>>> if (!activeA && !activeB) >>>> { >>>> return 1; >>>> } >>>> >>>> return 0; >>>> } >>>> >>>> Compiled for arm-none-eabi with -O3 -march=armv7-a >>>> >>>> The inlined generated code for IsAwake contains the fragment: >>>> >>>> ldrh r0, [r1] >>>> and r0, r0, #2 >>>> uxth r0, r0 >>>> cmp r0, #0 >>>> >>>> which contains a redundant extend, which also confuses combine and >>>> prevents >>>> the whole thing from being optimised into an ldrh ; ands sequence. >>>> >>>> Looking at the tree dumps I notice that after the forwprop pass the types >>>> of >>>> the operands in the _7 = _3 & 2; statement turn into short unsigned where >>>> before that pass they were just ints: >>>> >>>> IsAwake (struct b2Body * b) >>>> { >>>> short unsigned int _3; >>>> int _4; >>>> _Bool _6; >>>> short unsigned int _7; >>>> >>>> <bb 2>: >>>> _3 = b_2(D)->flags; >>>> _4 = (int) _3; >>>> _7 = _3 & 2; >>>> _6 = _7 != 0; >>>> return _6; >>>> >>>> } >>>> >>>> >>>> I believe the C standard expects the operation to be performed in int >>>> mode. >>>> Now, since this is a bitwise and operation with a known constant 2, the >>>> operation can be safely performed in unsigned short mode. However on >>>> word-based machines like arm this would introduce unnecessary extend >>>> operations down the line, as I believe is the case here. >>> >>> Though the variant using shorts is clearly shorter (in number of stmts) >>> and thus easier to optimize. Am I correct that the issue in the end >>> is that _7 != 0 requires to extend _7? & 2 is trivially performed without >>> any extension, no? >> >> >> If I look at the dump before forwprop, the number of statements is exactly >> the same, so it's not any shorter in that sense. > > Well, it is - _4 = (int) _3; is dead, thus a zero-extension instruction > is removed. >
That's a rather short-sighted definition of removed. You've removed an extension that: a) can be merged with the preceding load b) will have to be put back again anyway, when the _4 & 2 is expanded. And finally, you end up with a second one when _5 != 0 is later expanded. R. >> <Dump from before forwprop> >> >> IsAwake (struct b2Body * b) >> { >> short unsigned int _3; >> int _4; >> int _5; >> _Bool _6; >> >> >> <bb 2>: >> _3 = b_2(D)->flags; >> _4 = (int) _3; >> _5 = _4 & 2; >> _6 = _5 != 0; >> return _6; >> >> } >> >> Using shorts is not cheaper on an architecture like arm which is word-based. >> Just the fact that we're introducing shorts already implies we're going to >> have to extend somewhere. > > But given the bit-and with '2' the extension can be removed, no? > > Richard. > >> Kyrill >> >> >>> >>> Richard. >>> >>>> Anyone have any insight on how to resolve this one? >>>> >>>> Thanks, >>>> Kyrill >>>> >> >> >