On Thu, Mar 12, 2026 at 06:18:09PM +0800, Li Wang wrote: > Waiman Long wrote: > > > On 3/11/26 4:49 AM, Lucas Liu wrote: > > > Hi recently I met this issue > > > ./test_kmem > > > ok 1 test_kmem_basic > > > ok 2 test_kmem_memcg_deletion > > > ok 3 test_kmem_proc_kpagecgroup > > > ok 4 test_kmem_kernel_stacks > > > ok 5 test_kmem_dead_cgroups > > > memory.current 24514560 > > > percpu 15280000 > > > not ok 6 test_percpu_basic > > > > > > In this test the memory.current 24514560, percpu 15280000, Diff ~9.2MB. > > > > > > #define MAX_VMSTAT_ERROR (4096 * 64 * get_nprocs()) > > > > > > in this part (8cpus) MAX_VMSTAT_ERROR is 4M memory. On the RT kernel, > > > the labs(current - percpu) is 9.2M, that is the root cause for this > > > failure. I am not sure what value is suitable for this case(2M per cpu > > > maybe?) > > > > Li Wang had posted patches to address some of the problems in this test. > > > > https://lore.kernel.org/lkml/[email protected]/ > > > > It could be the case that lazy percpu stat flushing can also be a factor > > here. In this case, we may need to reread the stat counters again several > > time with some delay to solve this problem. > > When memory.stat is read, the kernel calls mem_cgroup_flush_stats(), which > invokes cgroup_rstat_flush() to drain per-cpu counters before returning > results. So in the normal read path, stats are flushed, they aren't > arbitrarily stale at the point this test reads them. > > The "lazy" aspect, my understand, is that background flushing maybe skipped > sometime, as there is an situation: __mem_cgroup_flush_stats() skips the > flush if the total pending update is below a threshold, i.e. > > 575 static bool memcg_vmstats_needs_flush(struct memcg_vmstats *vmstats) > 576 { > 577 return atomic64_read(&vmstats->stats_updates) > > 578 MEMCG_CHARGE_BATCH * num_online_cpus(); > 579 } > > So the "lazy" could happen on a machine with too many CPUs, that threshold > can be non-trivial and could contribute a few MB of discrepancy. > > But my failure observed on a 3CPUs box, it shouldn't go with "lazy" skip. > > # ./test_kmem > TAP version 13 > 1..6 > ok 1 test_kmem_basic > ok 2 test_kmem_memcg_deletion > ok 3 test_kmem_proc_kpagecgroup > ok 4 test_kmem_kernel_stacks > ok 5 test_kmem_dead_cgroups > memory.current 11530240 > percpu 8440000 > not ok 6 test_percpu_basic > # Totals: pass:5 fail:1 xfail:0 xpass:0 skip:0 error:0 > > # uname -r > 6.12.0-211.el10.aarch64 > > # getconf PAGE_SIZE > 4096 > > # lscpu > Architecture: aarch64 > CPU op-mode(s): 32-bit, 64-bit > Byte Order: Little Endian > CPU(s): 3 > On-line CPU(s) list: 0-2 > ... > > Even on Lucas's test system, (8cpus), I assume the pagesize is 4k, the > threashold is 2M is still less than the failed result: > 64 × 8 = 512 pages = 512 × 4096 = 2 MB > > Bose on the above two testing, the lazy produce deviation is not > like the root cause.
BTW, if the lazy flush does become a problem on large-CPU machines in real test, we can add a retry loop (like Waiman suggested) in a seperate patch. But I'd prefer to keep this one focused on the missing slab accounting first. -- Regards, Li Wang
