On Mon, Mar 02, 2026 at 04:51:12PM +0100, Christian König wrote:
> On 3/2/26 16:40, Shakeel Butt wrote:
> > +TJ
> > 
> > On Mon, Mar 02, 2026 at 03:37:37PM +0100, Christian König wrote:
> >> On 3/2/26 15:15, Shakeel Butt wrote:
> >>> On Wed, Feb 25, 2026 at 10:09:55AM +0100, Christian König wrote:
> >>>> On 2/24/26 20:28, Dave Airlie wrote:
> >>> [...]
> >>>>
> >>>>> This has been a pain in the ass for desktop for years, and I'd like to
> >>>>> fix it, the HPC use case if purely a driver for me doing the work.
> >>>>
> >>>> Wait a second. How does accounting to cgroups help with that in any way?
> >>>>
> >>>> The last time I looked into this problem the OOM killer worked based on 
> >>>> the per task_struct stats which couldn't be influenced this way.
> >>>>
> >>>
> >>> It depends on the context of the oom-killer. If the oom-killer is 
> >>> triggered due
> >>> to memcg limits then only the processes in the scope of the memcg will be
> >>> targetted by the oom-killer. With the specific setting, the oom-killer 
> >>> can kill
> >>> all the processes in the target memcg.
> >>>
> >>> However nowadays the userspace oom-killer is preferred over the kernel
> >>> oom-killer due to flexibility and configurability. Userspace oom-killers 
> >>> like
> >>> systmd-oomd, Android's LMKD or fb-oomd are being used in containerized
> >>> environments. Such oom-killers looks at memcg stats and hiding something
> >>> something from memcg i.e. not charging to memcg will hide such usage from 
> >>> these
> >>> oom-killers.
> >>
> >> Well exactly that's the problem. Android's oom killer is *not* using memcg 
> >> exactly because of this inflexibility.
> > 
> > Are you sure Android's oom killer is not using memcg? From what I see in the
> > documentation [1], it requires memcg.
> 
> My bad, I should have been wording that better.
> 
> The Android OOM killer is not using memcg for tracking GPU memory 
> allocations, because memcg doesn't have proper support for tracking shared 
> buffers.

Yes indeed memcg is bad with buffers shared between memcgs (shmem, shared
filesystems).

> 
> In other words GPU memory allocations are shared by design and it is the norm 
> that the process which is using it is not the process which has allocated it.

Here the GPU memory can be system memory or the actual memory on GPU, right?

I think I discussed with TJ on the possibility of moving the allocations in the
context of process using through custom fault handler in GPU drivers. I don't
remember the conclusion but I am assuming that is not possible.

> 
> What we would need (as a start) to handle all of this with memcg would be to 
> accounted the resources to the process which referenced it and not the one 
> which allocated it.

Irrespective of memcg charging decision, one of my request would be to at least
have global counters for the GPU memory which this series is adding. That would
be very similar to NR_KERNEL_FILE_PAGES where we explicit opt-out of memcg
charging but keep the global counter, so the admin can identify the reasons
behind high unaccounted memory on the system.

> 
> I can give a full list of requirements which would be needed by cgroups to 
> cover all the different use cases, but it basically means tons of extra 
> complexity.
> 
> Regards,
> Christian.
> 
> > 
> > [1] https://source.android.com/docs/core/perf/lmkd
> > 
> >>
> >> See the multiple iterations we already had on that topic. Even including 
> >> reverting already upstream uAPI.
> >>
> >> The latest incarnation is that BPF is used for this task on Android.
> >>
> >> Regards,
> >> Christian.
> 

Reply via email to