On Tue, Aug 16, 2011 at 11:32 PM, Richard Sandiford
<richard.sandif...@linaro.org> wrote:
> Michael Hope <michael.h...@linaro.org> writes:
>> I put a build harness around libav and gathered some profiling data.  See:
>>  bzr branch lp:~linaro-toolchain-dev/+junk/libav-suite
>>
>> It includes a Makefile that builds a C only, h.264 only decoder and
>> two Creative Commons licensed videos to use as input.
>
> Thanks for putting this together.
>
>> README.rst has the basic commands for running ffmpeg and initial perf
>> results showing the hot functions.  Dave, 20 % of the time is spent in
>> memcpy() so you might want to have a look.
>>
>> The vectoriser has no effect.  GCC 4.5 is ~17 % faster than 4.6.  I'll
>> look into extracting and harnessing the functions themselves later
>> this week.
>
> I had a look why auto-vectorisation wasn't having much effect.
> It looks from your profile that most of the hot functions are
> operating on 16x16 blocks of pixels with an unknown line stride.
> So the C code looks like:
>
>    for (i = 0; i < 16; i++)
>      {
>        x[0] = OP (x[0]);
>        ...
>        x[15] = OP (x[15]);
>        x += stride;
>      }
>
> Because of the unknown stride, we're relying on SLP rather than
> loop-based vectorisation to handle this kind of loop.  The problem
> is that SLP is being run _as_ a loop optimisation.  At the moment,
> the gimple data-ref analysis code assumes that, during a loop
> optimisation, only simple induction variables are of interest,
> so it treats all of the x[...] references above as unrepresentable.
> If I move SLP outside the loop optimisations (just as a proof of concept),
> then that problem goes away.
>
> I talked about this with Ira, who said that SLP had been placed
> where it is because ivopts (a later loop optimisation) obfuscates
> things too much.  As Ira said, we should probably look at (conditionally)
> removing the assumption that only IVs are of interest during loop
> optimisations.
>
> Another problem is that SLP supports a much smaller range of
> optimisations than the loop-based vectoriser.  There's no support
> for promotion, demotion, or conditional expressions.  This affects
> things like the weight_h264_pixels* functions, which contain
> conditional moves.

I had a poke about.  GCC isn't too happy about unrolled loops either.
put_h264_chroma_mc8_8_c() is defined via a macro in dsputil_template.c
and is manually unwound by eight as:

        for(i=0; i<h; i++){\
            OP(dst[0], (A*src[0] + B*src[1] + C*src[stride+0] +
D*src[stride+1]));\
            OP(dst[1], (A*src[1] + B*src[2] + C*src[stride+1] +
D*src[stride+2]));\
            OP(dst[2], (A*src[2] + B*src[3] + C*src[stride+2] +
D*src[stride+3]));\
            OP(dst[3], (A*src[3] + B*src[4] + C*src[stride+3] +
D*src[stride+4]));\
            OP(dst[4], (A*src[4] + B*src[5] + C*src[stride+4] +
D*src[stride+5]));\
            OP(dst[5], (A*src[5] + B*src[6] + C*src[stride+5] +
D*src[stride+6]));\
            OP(dst[6], (A*src[6] + B*src[7] + C*src[stride+6] +
D*src[stride+7]));\
            OP(dst[7], (A*src[7] + B*src[8] + C*src[stride+7] +
D*src[stride+8]));\
            dst+= stride;\
            src+= stride;\
        }\

where OP is an assignment.

Reducing this to:

#define A 3
#define B 4

void unrolled(uint8_t * __restrict dst, uint8_t * __restrict src, int h)
{
    h /= 8;
    for (int i = 0; i < h; i++) {
        dst[0] = A*src[0] + B*src[0+1];
        dst[1] = A*src[1] + B*src[1+1];
        dst[2] = A*src[2] + B*src[2+1];
        dst[3] = A*src[3] + B*src[3+1];
        dst[4] = A*src[4] + B*src[4+1];
        dst[5] = A*src[5] + B*src[5+1];
        dst[6] = A*src[6] + B*src[6+1];
        dst[7] = A*src[7] + B*src[7+1];
        dst += 8;
        src += 8;
    }
}

void plain(uint8_t * __restrict dst, uint8_t * __restrict src, int h)
{
    for (int i = 0; i < h; i++) {
        dst[i] = A*src[i] + B*src[i+1];
    }
}

plain() gets vectorised where unrolled() doesn't.

-- Michael

_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Reply via email to