On Fri, Apr 20, 2012 at 3:43 PM, Matt Turner <[email protected]> wrote: > On Thu, Apr 19, 2012 at 5:40 PM, Matt Turner <[email protected]> wrote: >> Uses the pmadd technique described in >> http://software.intel.com/sites/landingpage/legacy/mmx/MMX_App_24-16_Bit_Conversion.pdf >> +static force_inline __m64 >> +pack_4xpacked565 (__m64 a, __m64 b) >> +{ >> + __m64 rb0 = _mm_and_si64 (a, MC (packed_565_rb)); >> + __m64 rb1 = _mm_and_si64 (b, MC (packed_565_rb)); >> + >> + __m64 t0 = _mm_madd_pi16 (rb0, MC (565_pack_multiplier)); >> + __m64 t1 = _mm_madd_pi16 (rb1, MC (565_pack_multiplier)); >> + >> + __m64 g0 = _mm_and_si64 (a, MC (packed_565_g)); >> + __m64 g1 = _mm_and_si64 (b, MC (packed_565_g)); >> + >> + t0 = _mm_or_si64 (t0, g0); >> + t1 = _mm_or_si64 (t1, g1); >> + >> + t0 = shift(t0, -5); >> + t1 = shift(t1, -5 + 16); >> + >> + return _mm_shuffle_pi16 (_mm_or_si64 (t0, t1), _MM_SHUFFLE (3, 1, 2, >> 0)); >> +} > > I think the return statement can be simplified with a _mm_packs_pi32, > but I couldn't get it to work. If someone has a chance to take a look, > I'd be very appreciative.
I realized in talking with Søren on IRC that the code in the pdf converts to 555, which allows packssdw to work. We'd need packusdw here, but it wasn't added until SSE 4.1. It looks like the ffmpeg 888 -> 565 MMX code unpacks the input in a way that avoids needing to repack it at the end, but I don't think that is an improvement over an extra shuffle at the end. I'll play with it some and see. _______________________________________________ Pixman mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/pixman
