On 28 Apr 2022, at 21:50, Martin Storsjö wrote:
> [...]
> Compared with the previously applied (and reverted) patch, here, you
> previously had "mov x17, #4". I guess that'd mean the function only ever
> produced 8 output rows, while it now uses the real height parameter? Was this
> change a no-op (height is always 8?) or was this another hidden bug in the
> previous implementation?
>
Yes, this was another bug in a previous implementation which I've fixed in both
of the newer versions.
>> [...]
>> + sqxtun v6.8b, v20.8h
>> + sqxtun v7.8b, v21.8h
>> + st1 {v6.8b}, [ x0], x2
>> + st1 {v7.8b}, [x16], x2
>> + subs x17, x17, #1
>
> This could be "subs w6, w6, #2" and you wouldn't need the lsr instruction at
> all. And you could place the subs before the two st1 instructions to reduce
> latency between them a little. (The same thing goes for moving subs further
> away from the branch that uses its outcome in the previous patch too.) But as
> this is just a reapply of a previously committed and reverted patch, I guess
> it's fine this way too...
Will do before apply if you're fine with it, not too complex change.
> The patchset otherwise looks good to me, modulo the question about the
> difference to the previous patchset above.
--
J. Dekker
_______________________________________________
ffmpeg-devel mailing list
[email protected]
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
[email protected] with subject "unsubscribe".