On 17 May 2016 at 12:02, James Greenhalgh <james.greenha...@arm.com> wrote: > On Tue, May 17, 2016 at 11:32:36AM +0100, Marcus Shawcroft wrote: >> On 17 May 2016 at 10:06, James Greenhalgh <james.greenha...@arm.com> wrote: >> > >> > Hi, >> > >> > This is just a simplification, it probably makes life easier for register >> > allocation in some corner cases and seems the right thing to do. We don't >> > use the internal version elsewhere, so we're safe to delete it and change >> > the types. >> > >> > OK? >> > >> > Bootstrapped on AArch64 with no issues. >> >> Help me understand why this is ok for BE ? > > The reduc_plus_scal_<mode> pattern wants to take a vector and return a scalar > value representing the sum of the lanes of that vector. We want to go > from V2DFmode to DFmode. > > The architectural instruction FADDP writes to a scalar value in the low > bits of the register, leaving zeroes in the upper bits. > > i.e. > > faddp d0, v1.2d > > 128 64 0 > | 0x0 | v1.d[0] + v1.d[1] | > > In the current implementation, we use the > aarch64_reduc_plus_internal<mode> pattern, which treats the result of > FADDP as a vector of two elements. We then need an extra step to extract > the correct scalar value from that vector. From GCC's point of view the lane > containing the result is either lane 0 (little-endian) or lane 1 > (big-endian), which is why the current code is endian dependent. The extract > operation will always be a NOP move from architectural bits 0-63 to > architectural bits 0-63 - but we never elide the move as future passes can't > be certain that the upper bits are zero (they come out of an UNSPEC so > could be anything). > > However, this is all unneccesary. FADDP does exactly what we want, > regardless of endianness, we just need to model the instruction as writing > the scalar value in the first place. Which is what this patch wires up. > > We probably just missed this optimization in the migration from the > reduc_splus optabs (which required a vector return value) to the > reduc_plus_scal optabs (which require a scalar return value). > > Does that help?
Yep. Thanks. OK to commit. /Marcus > Thanks, > James >