Hi Waiman:
Thanks for responding, I have tried Li Wang's patch, The problem has been fixed.

# ./test_kmem
ok 1 test_kmem_basic
ok 2 test_kmem_memcg_deletion
ok 3 test_kmem_proc_kpagecgroup
ok 4 test_kmem_kernel_stacks
ok 5 test_kmem_dead_cgroups
ok 6 test_percpu_basic
[root@localhost cgroup]# bash run.sh
run 100 times...
--------------------------------------
proccess: 100/100  status: [  OK  ]  failure: 0
--------------------------------------
done
overall: 100
ok: 100
fail: 0


For the lazy percpu stat flushing, I assume this is expected behavior
for RT kernels? So Li Wang's patch can be our final solution? Please
correct me if I am wrong.

Thanks

On Wed, Mar 11, 2026 at 10:17 PM Waiman Long <[email protected]> wrote:
>
> On 3/11/26 4:49 AM, Lucas Liu wrote:
> > Hi recently I met this issue
> >   ./test_kmem
> > ok 1 test_kmem_basic
> > ok 2 test_kmem_memcg_deletion
> > ok 3 test_kmem_proc_kpagecgroup
> > ok 4 test_kmem_kernel_stacks
> > ok 5 test_kmem_dead_cgroups
> > memory.current 24514560
> > percpu 15280000
> > not ok 6 test_percpu_basic
> >
> > In this test the memory.current 24514560, percpu 15280000, Diff ~9.2MB.
> >
> > #define MAX_VMSTAT_ERROR (4096 * 64 * get_nprocs())
> >
> > in this part (8cpus) MAX_VMSTAT_ERROR is 4M memory. On the RT kernel,
> > the labs(current - percpu) is 9.2M, that is the root cause for this
> > failure. I am not sure what value is suitable for this case(2M per cpu
> > maybe?)
>
> Li Wang had posted patches to address some of the problems in this test.
>
> https://lore.kernel.org/lkml/[email protected]/
>
> It could be the case that lazy percpu stat flushing can also be a factor
> here. In this case, we may need to reread the stat counters again
> several time with some delay to solve this problem.
>
> Cheers,
> Longman
>


Reply via email to