On 22 June 2014 20:16, Peter Frühberger <[email protected]> wrote: > Hi > > 2014-06-22 11:08 GMT+02:00 Jean-Yves Avenard <[email protected]>: >> On 22 June 2014 19:06, Jean-Yves Avenard <[email protected]> wrote:
> I tested the sse4 copy algorithm vs the OpenGL approach we discussed > lately. In my testing I used a 1080p24 sample with H264 Level > 4.1@High. The average copy time of sse4 was arround 4ms. I benchmarked > similarly to your testings, see the patch here: > http://paste.ubuntu.com/7684464/ > > On the other hand I benchmarked the OpenGL approach. This approach has > won by more than factor 5 with arround 0.8ms per frame. you are not testing what I intended to test. here you are testing a NV12 frame, and vaDeriveImage. What I intended to show was that, via vaGetImage , not using USWC memory is *much* faster. And that speed-wise, you are much better of using vaGetImage instead of vaDeriveImage. Obviously that advantage would reduce a lot if the by Haihao's patch is applied Whatever speed gain you are noticing with vaDeriveImage SSE vs OpenGL would still be even greater should the memory had not been USWC. I should point that AMD's VAAPI doesn't support vaDeriveImage, so you must implement both methods regardless > > Note: I also measured vaSyncSurface as you did the same, but it has > nothing to do with the "time" the copy needs, though querying if the > surface is "not in used" anymore is not really doable for all vaapi > implementations. seeing the timing for both instances is for exactly the same instructions, I'm not sure how that would help proving anything. I would still see vaGetImage normal memory vs vaGetImage USWC being much faster _______________________________________________ Libva mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/libva
