12 Jul 2021, 13:53 by [email protected]: > On 7/12/2021 7:46 AM, Lynne wrote: > >> 12 Jul 2021, 11:29 by [email protected]: >> >>> On Fri, Jun 25, 2021 at 1:24 PM Alan Kelly <[email protected]> wrote: >>> >>>> On Fri, Jun 25, 2021 at 10:40 AM Lynne <[email protected]> wrote: >>>> >>>>> Jun 25, 2021, 09:54 by [email protected]: >>>>> >>>>>> Broadwell and later and Zen3 and later have fast gather instructions. >>>>>> --- >>>>>> Gather requires between 9 and 12 cycles on Haswell, 5 to 7 on >>>>>> >>>>> Broadwell, >>>>> >>>>>> and 2 to 5 on Skylake and newer. It is also slow on AMD before Zen 3. >>>>>> libavutil/cpu.h | 2 ++ >>>>>> libavutil/x86/cpu.c | 18 ++++++++++++++++-- >>>>>> libavutil/x86/cpu.h | 1 + >>>>>> 3 files changed, 19 insertions(+), 2 deletions(-) >>>>>> >>>>> >>>>> No, we really don't need more FAST/SLOW flags, especially for >>>>> something like this which is just fixable by _not_using_vgather_. >>>>> Take a look at libavutil/x86/tx_float.asm, we only use vgather >>>>> if it's guaranteed to either be faster for what we're gathering or >>>>> is just as fast "slow". If neither is true, we use manual lookups, >>>>> which is actually advantageous since for AVX2 we can interleave >>>>> the lookups that happen in each lane. >>>>> >>>>> Even if we disregard this, I've extensively benchmarked vgather >>>>> on Zen 3, Zen 2, Cascade Lake and Skylake, and there's hardly >>>>> a great vgather improvement to be found in Zen 3 to justify >>>>> using a new CPU flag for this. >>>>> _______________________________________________ >>>>> ffmpeg-devel mailing list >>>>> [email protected] >>>>> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel >>>>> >>>>> To unsubscribe, visit link above, or email >>>>> [email protected] with subject "unsubscribe". >>>>> >>>> >>>> Thanks for your response. I'm not against finding a cleaner way of >>>> enabling/disabling the code which will be protected by this flag. However, >>>> the manual lookups solution proposed will not work in this case, the avx2 >>>> version of hscale will only be faster if fast gathers are available, >>>> otherwise, the ssse3 version should be used. >>>> >>>> I haven't got access to a Zen3 so I can't comment on the performance. I >>>> have tested on a Zen 2 and it is slow. On Broadwell hscale avx2 is about >>>> 10% faster than the ssse3 version and on Skylake about 40% faster, Haswell >>>> has similar performance to Zen2. >>>> >>>> Is there a proxy which could be used for detecting Broadwell or Skylake >>>> and later? AVX512 seems too strict as there are Skylake chips without >>>> AVX512. Thanks >>>> >>> >>> Hi, >>> >>> I will paste the performance figures from the thread for the other part of >>> this patch here so that the justification for this flag is clearer: >>> >>> Skylake Haswell >>> hscale_8_to_15_width4_ssse3 761.2 760 >>> hscale_8_to_15_width4_avx2 468.7 957 >>> hscale_8_to_15_width8_ssse3 1170.7 1032 >>> hscale_8_to_15_width8_avx2 865.7 1979 >>> hscale_8_to_15_width12_ssse3 2172.2 2472 >>> hscale_8_to_15_width12_avx2 1245.7 2901 >>> hscale_8_to_15_width16_ssse3 2244.2 2400 >>> hscale_8_to_15_width16_avx2 1647.2 3681 >>> >>> As you can see, it is catastrophic on Haswell and older chips but the gains >>> on Skylake are impressive. >>> As I don't have performance figures for Zen 3, I can disable this feature >>> on all cpus apart from Broadwell and later as you say that there is no >>> worthwhile improvement on Zen3. Is this OK with you? >>> >> >> It's not that catastrophic. Since Haswell CPUs generally don't have >> large AVX2 gains, could you just exclude Haswell only from >> EXTERNAL_AVX2_FAST, and require EXTERNAL_AVX2_FAST >> to enable those functions? >> > > And disable all non gather AVX2 asm functions on Haswell? No. And it's a lie > that Haswell doesn't have large gains with AVX2. >
It won't disable ALL of the AVX2, but it'll affect a few random components, the most prominent of which is some (not all) hevc assembly. But I think I'd rather just not do anything at all. Performance of vgather even on Haswell is still above 2x the C version, and we barely have any vgathers in our code. And Haswell use is in decline too. _______________________________________________ ffmpeg-devel mailing list [email protected] https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email [email protected] with subject "unsubscribe".
