On Tue, May 19, 2026 at 9:20 AM Christian König <[email protected]> wrote: > > On 5/19/26 01:39, T.J. Mercier wrote: > > On Mon, May 18, 2026 at 7:07 AM Christian König > > <[email protected]> wrote: > >> > >> On 5/18/26 14:50, Albert Esteve wrote: > >>> On Mon, May 18, 2026 at 9:20 AM Christian König > >>> <[email protected]> wrote: > >>>> > >>>> On 5/15/26 19:06, T.J. Mercier wrote: > >>>>> On Fri, May 15, 2026 at 6:53 AM Christian Brauner <[email protected]> > >>>>> wrote: > >>>>>> > >>>>>> On Tue, May 12, 2026 at 11:10:44AM +0200, Albert Esteve wrote: > >>>>>>> On embedded platforms a central process often allocates dma-buf > >>>>>>> memory on behalf of client applications. Without a way to > >>>>>>> attribute the charge to the requesting client's cgroup, the > >>>>>>> cost lands on the allocator, making per-cgroup memory limits > >>>>>>> ineffective for the actual consumers. > >>>>>>> > >>>>>>> Add charge_pid_fd to struct dma_heap_allocation_data. When set to > >>>>>> > >>>>>> Please be aware that pidfds come in two flavors: > >>>>>> > >>>>>> thread-group pidfds and thread-specific pidfds. Make sure that your API > >>>>>> doesn't implicitly depend on this distinction not existing. > >>>>> > >>>>> Hi Christian, > >>>>> > >>>>> Memcg is not a controller that supports "thread mode" so all threads > >>>>> in a group should belong to the same memcg. > >>>> > >>>> BTW: Exactly that is the requirement automotive has with their native > >>>> context use case. > >>>> > >>>> The use case is that you have a deamon which has multiple threads were > >>>> each one is acting on behalve of some other process. > >>>> > >>>> At the moment we basically say they are simply not using cgroups for > >>>> that use case, but it would be really nice if we could handle that as > >>>> well. > >>>> > >>>> Summarizing the requirement of that use case: You need a different > >>>> cgroup for each thread of a process. > >>> > >>> Hi Christian, > >>> > >>> Thanks for sharing this atuomotive usecase. If I understand correctly, > >>> the actual requirement is attributing dma-buf charges to the right > >>> client, not putting each daemon thread in a different cgroup? > >> > >> Nope, exactly that's the difference. > >> > >> The thread acts as a filtering agent for both memory allocation and > >> command submission for somebody else, the process on which behalve the > >> daemon does things can even be in a client VM, completely remote over some > >> network or even something like a microcontroller. > >> > >> Everything the thread does regarding CPU time, GPU driver memory > >> allocation as well as resources like GPU processing and I/O time etc.. > >> needs to be accounted to one client which can be different for each thread > >> of the process. > >> > >> The only thing which is shared with the main process thread is CPU memory > >> resources, e.g. malloc() because that is basically just needed for > >> housekeeping and pretty much irrelevant for this kind of use case. > >> > >> The problem is now you can't do that with cgroups at the moment but > >> unfortunately only the kernel has the information you need to know to do > >> this. > >> > >> So what you end up with is to define tons of interfaces just to get the > >> necessary information from the kernel into userspace and then essentially > >> duplicate the same infrastructure cgroup provides in the kernel in > >> userspace again. > >> > >>> If so, > >>> the `charge_pid_fd` approach achieves this directly by passing the > >>> client's `pid_fd`, without needing to add per-thread cgroup > >>> infrastructure. > >> > >> Well it's already a massive improvemt, we could basically stop doing the > >> whole duplication part for the GPU driver stack and just use cgroups for > >> this part. > >> > >> Doing that automatically for CPU and I/O time would just be nice to have > >> additionally. > >> > >> Regards, > >> Christian. > > > > Hopefully I'm following correctly here.... So you are duplicating the > > GPU driver stack to achieve remote accounting on a per-thread basis? > > Not quite, we are duplicating the handling cgroup provides in the kernel in > userspace. > > For this memory usage information as well as execution times of the GPU > kernel driver is exposed in fdinfo for example. > > > Does this mean for GPU allocations you currently have some GFP_ACCOUNT > > magic in your driver to attribute GPU memory to the correct remote > > client? > > No, we just expose what the kernel driver has allocated for itself. E.g. page > tables, buffers etc... > > When userspace allocates something using memfd_create() for example we just > ignore that.
> > > So this series would close the gap for dma-buf allocations, > > but what about private GPU driver memory allocated on behalf of a > > client? > > Well we would need a cgroup which isn't associated with any process were we > could charge the GPU driver allocations against. I think I better understand your framing for this now. Thanks again for taking the time to explain. I was looking for a way to pass cgroup around to do the charge. I found that `struct cgroup *cgroup_get_from_fd(int fd)` already exists in cgroups available symbols to handle cgroup directories. So here's an idea... Rename the charge_pid_fd to charge_fd: - If it is a pidfd (`!IS_ERR(pidfd_pid(fget(charge_fd)))`) then we do what we're already doing here. - If it is a cgroup_fd (`!IS_ERR(cgroup_get_from_fd(charge_fd))`) then we charge to that cgroup. Also we could add add an ioctl for the generic fd path similar to what we have for dma-buf heaps. Or have a new flavour for memfd_create: ``` memfd_create2(name, flags, charge_fd); ``` The transfer ioctl could also be made generic to accept both pidfds and cgroup_fds. For this series we could move forward as is, and make the generic solution a follow-up series, knowing that the field can be reused for cgroup fds. > > But good point, charging against a pid wouldn't work in this use case. > > Regards, > Christian. >

