dmem: Add reclaim callback for lowering max below current usage

sashiko-bot Thu, 11 Jun 2026 07:40:53 -0700

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) 
to consider:


New issues:
- [Medium] The docstring for `struct dmem_cgroup_ops->reclaim` explicitly 
contradicts the implementation for `-ENOSPC` return values.

Pre-existing issues:
- [High] On 32-bit systems, `dmemcg` limits tracked in bytes are truncated to 
32-bit `unsigned long`, leading to overflows and artificially severe 
constraints.
--

commit ccfc580d266b8b80afec154f819853307351d98a
Author: Thomas Hellström <[email protected]>

cgroup/dmem: Add reclaim callback for lowering max below current usage

Add an optional reclaim callback to struct dmem_cgroup_region. When
dmem.max is set below the current usage of a cgroup pool, the new limit
is applied immediately and then the driver is asked to evict memory.

> diff --git a/include/linux/cgroup_dmem.h b/include/linux/cgroup_dmem.h
> index d9eab8a2c1ee6..d705e94d8784d 100644
> --- a/include/linux/cgroup_dmem.h
> +++ b/include/linux/cgroup_dmem.h
> @@ -14,12 +14,33 @@ struct dmem_cgroup_pool_state;
>  /* Opaque definition of a cgroup region, used internally */
>  struct dmem_cgroup_region;
>  
> +/**
> + * struct dmem_cgroup_ops - Operations for a dmem cgroup region.
> + * @reclaim: Optional callback invoked when dmem.max is set below the current
> + *           usage of a pool. The driver should attempt to free at least
> + *           @target_bytes from @pool. May be called multiple times if usage
> + *           remains above the limit after returning.
> + *
> + *           Return: 0 if some progress was made (even if less than
> + *           @target_bytes was freed), -ENOSPC if no progress could be made,
> + *           or another negative error code if a fatal error occurred.
> + *           Any non-zero return stops further reclaim attempts.

[Severity: Medium]
Is the documentation accurate regarding the -ENOSPC return code?

The docstring here explicitly states that any non-zero return stops further
reclaim attempts. However, the implementation in set_resource_max() appears
to intercept -ENOSPC and will retry up to 16 times:

    if (ret && (ret != -ENOSPC || !retries--))
        break;

Could this mislead driver authors into assuming -ENOSPC is a terminal
return code that aborts the loop immediately, when it actually subjects
the driver to continuous retries?

> + */
> +struct dmem_cgroup_ops {
> +     int (*reclaim)(struct dmem_cgroup_pool_state *pool,
> +                    u64 target_bytes, void *priv);
> +};

[ ... ]

> diff --git a/kernel/cgroup/dmem.c b/kernel/cgroup/dmem.c
> index 373ad04d289d5..f7166b8255c7b 100644
> --- a/kernel/cgroup/dmem.c
> +++ b/kernel/cgroup/dmem.c

[ ... ]

>  static void
> -set_resource_max(struct dmem_cgroup_pool_state *pool, u64 val)
> +set_resource_max(struct dmem_cgroup_pool_state *pool, u64 val, bool nonblock)
>  {
> -     page_counter_set_max(&pool->cnt, val);
> +     struct dmem_cgroup_region *region = pool->region;
> +     unsigned long limit = (unsigned long)val;

[Severity: High]
This is a pre-existing issue, but does this truncation to unsigned long cause
severe limitations on 32-bit systems?

The dmemcg subsystem accounts for memory limits in bytes, but integrates with
the page_counter infrastructure which tracks limits using unsigned long. The
64-bit byte limit is truncated here, wrapping values larger than 4GB on 32-bit
architectures.

Additionally, when setting the limit to 'max' via dmemcg_parse_limit() or
reset_all_resource_limits(), it translates to PAGE_COUNTER_MAX (which is
~524,287 on 32-bit). Because dmem_cgroup_try_charge() passes the allocation
size in bytes to page_counter_try_charge(), a limit of 'max' restricts usage
to ~512KB instead of being unlimited. Will this cause 32-bit workloads to OOM
on allocations larger than ~512KB?

> +
> +     /*
> +      * Always update the limit, even if usage currently exceeds it.
> +      * Concurrent allocations will be throttled against the new limit
> +      * while reclaim is in progress.
> +      */
> +     xchg(&pool->cnt.max, limit);

-- 
Sashiko AI review · 
https://sashiko.dev/#/patchset/[email protected]?part=3

Re: [PATCH v5 3/6] cgroup/dmem: Add reclaim callback for lowering max below current usage

Reply via email to