2025年11月20日 11:04, "Roman Gushchin" <[email protected] mailto:[email protected]?to=%22Roman%20Gushchin%22%20%3Croman.gushchin%40linux.dev%3E > 写到:
> > Hui Zhu <[email protected]> writes: > > > > > From: Hui Zhu <[email protected]> > > > > This series proposes adding eBPF support to the Linux memory > > controller, enabling dynamic and extensible memory management > > policies at runtime. > > > > Background > > > > The memory controller (memcg) currently provides fixed memory > > accounting and reclamation policies through static kernel code. > > This limits flexibility for specialized workloads and use cases > > that require custom memory management strategies. > > > > By enabling eBPF programs to hook into key memory control > > operations, administrators can implement custom policies without > > recompiling the kernel, while maintaining the safety guarantees > > provided by the BPF verifier. > > > > Use Cases > > > > 1. Custom memory reclamation strategies for specialized workloads > > 2. Dynamic memory pressure monitoring and telemetry > > 3. Memory accounting adjustments based on runtime conditions > > 4. Integration with container orchestration systems for > > intelligent resource management > > 5. Research and experimentation with novel memory management > > algorithms > > > > Design Overview > > > > This series introduces: > > > > 1. A new BPF struct ops type (`memcg_ops`) that allows eBPF > > programs to implement custom behavior for memory charging > > operations. > > > > 2. A hook point in the `try_charge_memcg()` fast path that > > invokes registered eBPF programs to determine if custom > > memory management should be applied. > > > > 3. The eBPF handler can inspect memory cgroup context and > > optionally modify certain parameters (e.g., `nr_pages` for > > reclamation size). > > > > 4. A reference counting mechanism using `percpu_ref` to safely > > manage the lifecycle of registered eBPF struct ops instances. > > > Can you please describe how these hooks will be used in practice? > What's the problem you can solve with it and can't without? > > I generally agree with an idea to use BPF for various memcg-related > policies, but I'm not sure how specific callbacks can be used in > practice. Hi Roman, Following are some ideas that can use ebpf memcg: Priority‑Based Reclaim and Limits in Multi‑Tenant Environments: On a single machine with multiple tenants / namespaces / containers, under memory pressure it’s hard to decide “who should be squeezed first” with static policies baked into the kernel. Assign a BPF profile to each tenant’s memcg: Under high global pressure, BPF can decide: Which memcgs’ memory.high should be raised (delaying reclaim), Which memcgs should be scanned and reclaimed more aggressively. Online Profiling / Diagnosing Memory Hotspots: A cgroup’s memory keeps growing, but without patching the kernel it’s difficult to obtain fine‑grained information. Attach BPF to the memcg charge/uncharge path: Record large allocations (greater than N KB) with call stacks and owning file/module, and send them to user space via a BPF ring buffer. Based on sampled data, generate: “Top N memory allocation stacks in this container over the last 10 minutes,” Reports of which objects / call paths are growing fastest. This makes it possible to pinpoint the root cause of host memory anomalies without changing application code, which is very useful in operations/ops scenarios. SLO‑Driven Auto Throttling / Scale‑In/Out Signals: Use eBPF to observe memory usage slope, frequent reclaim, or near‑OOM behavior within a memcg. When it decides “OOM is imminent,” instead of just killing/raising limits, it can emit a signal to a control‑plane component. For example, send an event to a user‑space agent to trigger automatic scaling, QPS adjustment, or throttling. Prevent a cgroup from launching a large‑scale fork+malloc attack: BPF checks per‑uid or per‑cgroup allocation behavior over the last few seconds during memcg charge. And I maintain a software project, https://github.com/teawater/mem-agent, for specialized memory management and related functions. However, I found that implementing certain memory QoS categories for memcg solely from user space is rather inefficient, as it requires frequent access to values within memcg. This is why I want memcg to support eBPF—so that I can place custom memory management logic directly into the kernel using eBPF, greatly improving efficiency. Best, Hui > > Thanks! >

