Re: [FFmpeg-devel] [aarch64] yuv2planeX - unroll outer loop by 4 to increase performance by 6.3%

Sebastian Pop Thu, 03 Sep 2020 09:59:24 -0700

On Wed, Aug 19, 2020 at 6:55 AM Michael Niedermayer <[email protected]>
wrote:


> faster is better obviously, so if its tested with odd sizes and arm
> developers had a chance to comment. it should be ok


Hi, I'm looking for feedback from ARM maintainers on the attached patch.
Ok to commit the patch?

Thanks,
Sebastian

On Wed, Aug 19, 2020 at 1:37 PM Sebastian Pop <[email protected]> wrote:

> Thanks Michael for your feedback.
>
> On Wed, Aug 19, 2020 at 6:55 AM Michael Niedermayer <[email protected]>
> wrote:
>
>> faster is better obviously, so if its tested with odd sizes and arm
>> developers had a chance to comment. it should be ok
>>
>>
> The current patch was tested with `make check` on Arm64 Graviton2.
> I also have tested randomly selected rescale factors, for example:
> ./ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf
> bench=start,scale=1023x42,bench=stop -f null -
>
>
>> one potential improvment is to use the unrolled code for odd width
>> too and use the non unrolled for the end
>>
>
> Done.  Please see the amended patch.
>
> Thanks,
> Sebastian
>

0001-aarch64-yuv2planeX-unroll-outer-loop-by-4-increases-.patch
Description: Binary data

_______________________________________________
ffmpeg-devel mailing list
[email protected]
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
[email protected] with subject "unsubscribe".

Re: [FFmpeg-devel] [aarch64] yuv2planeX - unroll outer loop by 4 to increase performance by 6.3%

Reply via email to