ping flow gg <[email protected]> 于2024年3月8日周五 17:46写道:
> Alright, using m8, but for now don't add code to address dependencies in > loops that have a minor impact. Updated in the reply > > Rémi Denis-Courmont <[email protected]> 于2024年3月8日周五 17:08写道: > >> >> >> Le 8 mars 2024 02:45:46 GMT+02:00, flow gg <[email protected]> a >> écrit : >> >> Isn't it also faster to max LMUL for the adds here? >> > >> >It requires the use of one more vset, making the time slightly longer: >> >147.7 (m1), 148.7 (m8 + vset). >> >> A variation of 0.6% on a single set of kernels will end up below >> measurement noise in real overall codec usage. And then reducing the >> I-cache contention can improve performance in other ways. Larger LMUL >> should also improve performance on bigger cores with more ALUs. So it's not >> all black and white. >> >> My personal preference is to keep the code small if it makes almost no >> difference but I'm not BDFL. >> >> >Also this might not be much noticeable on C908, but avoiding sequential >> >dependencies on the address registers may help. I mean, avoid using as >> >address >> >operand a value that was calculated by the immediate previous >> instruction. >> > >> >> Okay, but the test results haven't changed.. >> >It would add more than ten lines of code, perhaps shorter code will >> better? >> >> I don't know. There are definitely in-order vector cores coming, and data >> dependencies will hurt them. But I don't know if anyone will care about >> FFmpeg on those. >> _______________________________________________ >> ffmpeg-devel mailing list >> [email protected] >> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel >> >> To unsubscribe, visit link above, or email >> [email protected] with subject "unsubscribe". >> > _______________________________________________ ffmpeg-devel mailing list [email protected] https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email [email protected] with subject "unsubscribe".
