Thank you for your contribution! Sashiko AI review found 3 potential issue(s) 
to consider:

Pre-existing issues:
- [High] dmem cgroup controller incorrectly tracks bytes using page_counter, 
causing 2GB limit overflow on 32-bit architectures
- [High] Concurrent limit updates can corrupt page counter limits due to lack 
of serialization
- [High] Missing release barrier when publishing pool initialization allows 
bypassing hierarchical limits
--

commit 265d8a6cb2da9910a15e7066ffcacbb4c27c4ff0
Author: Hongfu Li <[email protected]>
Subject: cgroup/dmem: Use size_t for try_charge() and uncharge() size

This commit changes the charge and uncharge APIs to accept a size_t value
for the allocation size rather than a u64. This prevents potential truncation
on 32-bit architectures when the size is passed to the internal page_counter
functions.

> diff --git a/kernel/cgroup/dmem.c b/kernel/cgroup/dmem.c
> index 4753a67d0f0f2..38108536d2d60 100644
> --- a/kernel/cgroup/dmem.c
> +++ b/kernel/cgroup/dmem.c
> @@ -649,7 +649,7 @@ EXPORT_SYMBOL_GPL(dmem_cgroup_uncharge);
>   *
>   * Return: 0 on success, -EAGAIN on hitting a limit, or a negative errno on 
> failure.
>   */
> -int dmem_cgroup_try_charge(struct dmem_cgroup_region *region, u64 size,
> +int dmem_cgroup_try_charge(struct dmem_cgroup_region *region, size_t size,
>                         struct dmem_cgroup_pool_state **ret_pool,
>                         struct dmem_cgroup_pool_state **ret_limit_pool)
>  {

[ ... ]

>       if (!page_counter_try_charge(&pool->cnt, size, &fail)) {

[Severity: High]
This is a pre-existing issue, but does using page_counter to track device
memory in bytes cause cumulative overflow on 32-bit architectures?

The page_counter API stores counts using unsigned long and internally
accumulates using atomic_long_add_return(). On 32-bit architectures, this
limits the tracking capability to LONG_MAX (2GB).

When cumulative device memory exceeds 2GB, the usage value wraps to negative,
and the signed bounds check in page_counter_try_charge() consistently fails,
causing valid allocations to be permanently rejected.

Similarly, in set_resource_max() and reset_all_resource_limits(), passing a
64-bit byte limit directly to page_counter_set_max() silently truncates
values above 4GB, and setting the default limit to PAGE_COUNTER_MAX restricts
usage to 2GB on 32-bit systems.

Could we track memory in pages instead of bytes to avoid this overflow?


[Severity: High]
This is a pre-existing issue, but can concurrent limit updates corrupt the
page counter limits?

page_counter_set_max() explicitly requires callers to serialize invocations.
Without serialization, concurrent writes can cause the optimistic
compare-and-swap retry loop to interleave, silently corrupting the maximum
limits and overwriting new limits with stale values.

This happens in dmemcg_limit_write() when applying limits:

kernel/cgroup/dmem.c:dmemcg_limit_write() {
    ...
                /* And commit */
                apply(pool, new_limit);
    ...
}

It also happens during cgroup offline when resetting limits under just an
RCU read lock:

kernel/cgroup/dmem.c:dmemcs_offline() {
    ...
        list_for_each_entry_rcu(pool, &dmemcs->pools, css_node)
                reset_all_resource_limits(pool);
    ...
}

Does this code need a lock to serialize these limit updates?


[Severity: High]
This is a pre-existing issue, but can the lack of memory barriers during pool
initialization allow hierarchical limits to be bypassed?

In alloc_pool_single(), a new pool is linked into the RCU list before it is
fully initialized. Later in get_cg_pool_locked(), pool->inited is set without
a release barrier:

kernel/cgroup/dmem.c:get_cg_pool_locked() {
    ...
                pool->cnt.parent = &ppool->cnt;
                if (ppool && !pool->parent) {
                        pool->parent = ppool;
                        dmemcg_pool_get(ppool);
                }
                pool->inited = true;
    ...
}

A concurrent reader in get_cg_pool_unlocked() uses READ_ONCE(pool->inited)
without an acquire barrier:

kernel/cgroup/dmem.c:get_cg_pool_unlocked() {
    ...
        pool = find_cg_pool_locked(cg, region);
        if (pool && !READ_ONCE(pool->inited))
                pool = NULL;
    ...
}

Due to memory reordering, the reader can observe pool->inited == true but
still see a NULL parent pointer. Would adding smp_store_release() and
smp_load_acquire() prevent this race?

-- 
Sashiko AI review ยท 
https://sashiko.dev/#/patchset/[email protected]?part=1

Reply via email to