On Mon, May 18, 2026 at 7:07 AM Christian König <[email protected]> wrote: > > On 5/18/26 14:50, Albert Esteve wrote: > > On Mon, May 18, 2026 at 9:20 AM Christian König > > <[email protected]> wrote: > >> > >> On 5/15/26 19:06, T.J. Mercier wrote: > >>> On Fri, May 15, 2026 at 6:53 AM Christian Brauner <[email protected]> > >>> wrote: > >>>> > >>>> On Tue, May 12, 2026 at 11:10:44AM +0200, Albert Esteve wrote: > >>>>> On embedded platforms a central process often allocates dma-buf > >>>>> memory on behalf of client applications. Without a way to > >>>>> attribute the charge to the requesting client's cgroup, the > >>>>> cost lands on the allocator, making per-cgroup memory limits > >>>>> ineffective for the actual consumers. > >>>>> > >>>>> Add charge_pid_fd to struct dma_heap_allocation_data. When set to > >>>> > >>>> Please be aware that pidfds come in two flavors: > >>>> > >>>> thread-group pidfds and thread-specific pidfds. Make sure that your API > >>>> doesn't implicitly depend on this distinction not existing. > >>> > >>> Hi Christian, > >>> > >>> Memcg is not a controller that supports "thread mode" so all threads > >>> in a group should belong to the same memcg. > >> > >> BTW: Exactly that is the requirement automotive has with their native > >> context use case. > >> > >> The use case is that you have a deamon which has multiple threads were > >> each one is acting on behalve of some other process. > >> > >> At the moment we basically say they are simply not using cgroups for that > >> use case, but it would be really nice if we could handle that as well. > >> > >> Summarizing the requirement of that use case: You need a different cgroup > >> for each thread of a process. > > > > Hi Christian, > > > > Thanks for sharing this atuomotive usecase. If I understand correctly, > > the actual requirement is attributing dma-buf charges to the right > > client, not putting each daemon thread in a different cgroup? > > Nope, exactly that's the difference. > > The thread acts as a filtering agent for both memory allocation and command > submission for somebody else, the process on which behalve the daemon does > things can even be in a client VM, completely remote over some network or > even something like a microcontroller. > > Everything the thread does regarding CPU time, GPU driver memory allocation > as well as resources like GPU processing and I/O time etc.. needs to be > accounted to one client which can be different for each thread of the process. > > The only thing which is shared with the main process thread is CPU memory > resources, e.g. malloc() because that is basically just needed for > housekeeping and pretty much irrelevant for this kind of use case. > > The problem is now you can't do that with cgroups at the moment but > unfortunately only the kernel has the information you need to know to do this. > > So what you end up with is to define tons of interfaces just to get the > necessary information from the kernel into userspace and then essentially > duplicate the same infrastructure cgroup provides in the kernel in userspace > again. > > > If so, > > the `charge_pid_fd` approach achieves this directly by passing the > > client's `pid_fd`, without needing to add per-thread cgroup > > infrastructure. > > Well it's already a massive improvemt, we could basically stop doing the > whole duplication part for the GPU driver stack and just use cgroups for this > part. > > Doing that automatically for CPU and I/O time would just be nice to have > additionally. > > Regards, > Christian.
Hopefully I'm following correctly here.... So you are duplicating the GPU driver stack to achieve remote accounting on a per-thread basis? Does this mean for GPU allocations you currently have some GFP_ACCOUNT magic in your driver to attribute GPU memory to the correct remote client? So this series would close the gap for dma-buf allocations, but what about private GPU driver memory allocated on behalf of a client?

