Dave Airlie wrote: >> Dave, I'd like to see the flag DRM_BO_FLAG_CACHED really mean cache-coherent >> memory, that is cache coherent also while visible to the GPU. There are HW >> implementations out there (Poulsbo at least) where this option actually seems >> to work, althought it's considerably slower for things like texturing. It's >> also a requirement for user bo's since they will have VMAs that we cant kill >> and remap. >> > > Most PCIE cards will be cache coherent, however AGP cards not so much, so > need to think if a generic _CACHED makes sense especially for something > like radeon, will I have to pass different flags depending on the GART > type.... this seems like uggh.. so maybe a separate flag makes more > sense.. > > OK. We're using this functionality in Poulsbo, so we should probably sort this out to avoid breaking things. >> Could we perhaps change the flag DRM_BO_FLAG_READ_CACHED to mean >> DRM_BO_FLAG_MAPPED_CACHED to implement the behaviour you describe. This will >> also indicate that the buffer cannot be used for user-space sub-allocators, >> as >> we in that case must be able to guarantee that the CPU can access parts of >> the >> buffer while other parts are validated for the GPU. >> > > Yes, to be honest sub-allocators for most use-cases should be avoided if > possible, we should be able to make the kernel interface fast enough for > most things if we don't have to switching caching flags on the fly at > map/destroy etc.. > Yes, Eric seems to have the same opinion. I'm not quite sure I understand the reasoning behind it. Is it the added complexity or something else?
While it's super to have a fast kernel interface, the inherent latency and allocation granularity will probably always make a user-space sub-allocator a desirable thing. Particularly something like a slab allocator that would also to some extent avoid fragmentation. My view of TTM has changed to be a bit from the opposite side: Let's say we have a fast user-space per-client allocator. What kernel functionality would we require to make sure that it can assume it's the sole owner of the memory it manages? For a repeated usage pattern like batch-buffers we end up allocating pages from the kernel, setting up one VMA per buffer, modifying gart- and page tables and in the worst case even caching policy for each and every use. Even if this can be made reasonably fast, I think it's a CPU overhead we really shouldn't be paying?? /Thomas ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ -- _______________________________________________ Dri-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dri-devel
