Hi Waiman: Thanks for responding, I have tried Li Wang's patch, The problem has been fixed.
# ./test_kmem ok 1 test_kmem_basic ok 2 test_kmem_memcg_deletion ok 3 test_kmem_proc_kpagecgroup ok 4 test_kmem_kernel_stacks ok 5 test_kmem_dead_cgroups ok 6 test_percpu_basic [root@localhost cgroup]# bash run.sh run 100 times... -------------------------------------- proccess: 100/100 status: [ OK ] failure: 0 -------------------------------------- done overall: 100 ok: 100 fail: 0 For the lazy percpu stat flushing, I assume this is expected behavior for RT kernels? So Li Wang's patch can be our final solution? Please correct me if I am wrong. Thanks On Wed, Mar 11, 2026 at 10:17 PM Waiman Long <[email protected]> wrote: > > On 3/11/26 4:49 AM, Lucas Liu wrote: > > Hi recently I met this issue > > ./test_kmem > > ok 1 test_kmem_basic > > ok 2 test_kmem_memcg_deletion > > ok 3 test_kmem_proc_kpagecgroup > > ok 4 test_kmem_kernel_stacks > > ok 5 test_kmem_dead_cgroups > > memory.current 24514560 > > percpu 15280000 > > not ok 6 test_percpu_basic > > > > In this test the memory.current 24514560, percpu 15280000, Diff ~9.2MB. > > > > #define MAX_VMSTAT_ERROR (4096 * 64 * get_nprocs()) > > > > in this part (8cpus) MAX_VMSTAT_ERROR is 4M memory. On the RT kernel, > > the labs(current - percpu) is 9.2M, that is the root cause for this > > failure. I am not sure what value is suitable for this case(2M per cpu > > maybe?) > > Li Wang had posted patches to address some of the problems in this test. > > https://lore.kernel.org/lkml/[email protected]/ > > It could be the case that lazy percpu stat flushing can also be a factor > here. In this case, we may need to reread the stat counters again > several time with some delay to solve this problem. > > Cheers, > Longman >
