On Sun, Aug 03, 2014 at 12:36:19AM +0200, Michael Niedermayer wrote:
> On Sat, Aug 02, 2014 at 11:34:07PM +0200, Clément Bœsch wrote:
[...]
> > +#ifdef TEST
> > +#define W1 320
> > +#define H1 240
> > +#define W2 640
> > +#define H2 480
> > +int main(void)
> > +{
> > + int i, a, ret = 0;
> > + DECLARE_ALIGNED(32, uint32_t, buf1)[W1*H1];
> > + DECLARE_ALIGNED(32, uint32_t, buf2)[W2*H2];
> > + uint32_t state = 0;
> > +
> > + for (i = 0; i < W1*H1; i++) {
> > + buf1[i] = state;
> > + state = state * 1664525 + 1013904223;
> > + }
> > +
> > + for (i = 0; i < W2*H2; i++) {
> > + buf2[i] = state;
> > + state = state * 1664525 + 1013904223;
> > + }
>
> the code should in addition be tested with maximal and minimal
> difference cases
> Tests added. > > [...] > > +;------------------------------------------------------------------------------- > > +; int ff_pixelutils_sad_[au]_16x16_sse(const uint8_t *src1, ptrdiff_t > > stride1, > > +; const uint8_t *src2, ptrdiff_t > > stride2); > > +;------------------------------------------------------------------------------- > > +%macro SAD_XMM_16x16 1 > > +INIT_XMM sse2 > > +cglobal pixelutils_sad_%1_16x16, 4,4,3, src1, stride1, src2, stride2 > > + pxor m2, m2 > > +%rep 8 > > + mov%1 m0, [src2q] > > + mov%1 m1, [src2q + stride2q] > > + psadbw m0, [src1q] > > + psadbw m1, [src1q + stride1q] > > + paddw m2, m0 > > + paddw m2, m1 > > + lea src1q, [src1q + 2*stride1q] > > + lea src2q, [src2q + 2*stride2q] > > +%endrep > > + movhlps m0, m2 > > + paddw m2, m0 > > + movd eax, m2 > > + RET > > +%endmacro > > there are various improvments possible, though these should be in > a seperate patch and not in gcc->yasm but > the pxor can be avoided by lifting the first iteration out and > using m2 as destination > > it might be faster to use 2 accumulator registers as that way both > could execute with no dependancies on the other > > as you unroll the loop, addressing can be done with fewer instructions > I left the ASM as is since it was kind of simple and parallel to the API itself; we can iterate from here with benchmarks > LGTM otherwise > Patchset applied, thanks [...] -- Clément B.
pgpGMcRddT6k3.pgp
Description: PGP signature
_______________________________________________ ffmpeg-devel mailing list [email protected] http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
