On Thu, Jul 10, 2025 at 10:49:19AM +0200, Pavel Machek wrote: > Hi! > > > > memcpy() from normal memory is about 2msec/1MB. Unfortunately, for > > > DMA-BUFs it is 20msec/1MB, and that basically means I can't easily do > > > 760p video recording. Plus, copying full-resolution photo buffer takes > > > more than 200msec! > > > > > > There's possibility to do some processing on GPU, and its implemented > > > here: > > > > > > https://gitlab.com/tui/tui/-/tree/master/icam?ref_type=heads > > > > > > but that hits the same problem in the end -- data is in DMA-BUF, > > > uncached, and takes way too long to copy out. > > > > > > And that's ... wrong. DMA ended seconds ago, complete cache flush > > > would be way cheaper than copying single frame out, and I still have > > > to deal with uncached frames. > > > > > > So I have two questions: > > > > > > 1) Is my analysis correct that, no matter how I get frame from v4l and > > > process it on GPU, I'll have to copy it from uncached memory in the > > > end? > > > > If you need to touch the buffers using the CPU then you are either > > stuck with uncached memory or you need to implement bracketed access to > > do the necessary cache maintenance. Be aware that completely flushing > > the cache is not really an option, as that would impact other > > workloads, so you have to flush the cache by walking the virtual > > address space of the buffer, which may take a significant amount of CPU > > time. > > What kind of "significant amount of CPU time" are we talking here? > Millisecond?
It really depends on the platform, the type of cache, and the size of the buffer. I remember that back in the N900 days a selective cash clean of a large buffer for full resolution images took several dozens of milliseconds, possibly close to 100ms. We had to clean the whole D-cache to make it fast enough, but you can't always do that as Lucas mentioned. > Bracketed access is fine with me. > > Flushing a cache should be an option. I'm root, there's no other > significant workload, and copying out the buffer takes 200msec+. There > are lot of cache flushes that can be done in quarter a second! > > > However, if you are only going to use the buffer with the GPU I see no > > reason to touch it from the CPU side. Why would you even need to copy > > the content? After all dma-bufs are meant to enable zero-copy between > > DMA capable accelerators. You can simply import the V4L2 buffer into a > > GL texture using EGL_EXT_image_dma_buf_import. Using this path you > > don't need to bother with the cache at all, as the GPU will directly > > read the video buffers from RAM. > > Yes, so GPU will read video buffer from RAM, then debayer it, and then > what? Then I need to store a data into raw file, or use CPU to turn it > into JPEG file, or maybe run video encoder on it. That are all tasks > that are done on CPU... -- Regards, Laurent Pinchart
