Le lauantaina 2. maaliskuuta 2024, 14.06.13 EET flow gg a écrit :
> Here adjusting the order, rather than simply using .rept, will be 13%-24%
> faster.

Isn't it also faster to max LMUL for the adds here?

Also this might not be much noticeable on C908, but avoiding sequential 
dependencies on the address registers may help. I mean, avoid using as address 
operand a value that was calculated by the immediate previous instruction.

-- 
Rémi Denis-Courmont
http://www.remlab.net/



_______________________________________________
ffmpeg-devel mailing list
[email protected]
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
[email protected] with subject "unsubscribe".

Reply via email to