On Wed, Mar 25, 2026 at 9:47 AM Waiman Long <[email protected]> wrote: > > On 3/23/26 8:15 PM, Yosry Ahmed wrote: > > On Mon, Mar 23, 2026 at 5:46 AM Li Wang <[email protected]> wrote: > >> On Fri, Mar 20, 2026 at 04:42:35PM -0400, Waiman Long wrote: > >>> The vmstats flush threshold currently increases linearly with the > >>> number of online CPUs. As the number of CPUs increases over time, it > >>> will become increasingly difficult to meet the threshold and update the > >>> vmstats data in a timely manner. These days, systems with hundreds of > >>> CPUs or even thousands of them are becoming more common. > >>> > >>> For example, the test_memcg_sock test of test_memcontrol always fails > >>> when running on an arm64 system with 128 CPUs. It is because the > >>> threshold is now 64*128 = 8192. With 4k page size, it needs changes in > >>> 32 MB of memory. It will be even worse with larger page size like 64k. > >>> > >>> To make the output of memory.stat more correct, it is better to scale > >>> up the threshold slower than linearly with the number of CPUs. The > >>> int_sqrt() function is a good compromise as suggested by Li Wang [1]. > >>> An extra 2 is added to make sure that we will double the threshold for > >>> a 2-core system. The increase will be slower after that. > >>> > >>> With the int_sqrt() scale, we can use the possibly larger > >>> num_possible_cpus() instead of num_online_cpus() which may change at > >>> run time. > >>> > >>> Although there is supposed to be a periodic and asynchronous flush of > >>> vmstats every 2 seconds, the actual time lag between succesive runs > >>> can actually vary quite a bit. In fact, I have seen time lags of up > >>> to 10s of seconds in some cases. So we couldn't too rely on the hope > >>> that there will be an asynchronous vmstats flush every 2 seconds. This > >>> may be something we need to look into. > >>> > >>> [1] https://lore.kernel.org/lkml/[email protected]/ > >>> > >>> Suggested-by: Li Wang <[email protected]> > >>> Signed-off-by: Waiman Long <[email protected]> > > What's the motivation for this fix? Is it purely to make tests more > > reliable on systems with larger page sizes? > > > > We need some performance tests to make sure we're not flushing too > > eagerly with the sqrt scale imo. We need to make sure that when we > > have a lot of cgroups and a lot of flushers we don't end up performing > > worse. > > I will include some performance data in the next version. Do you have > any suggestion of which readily available tests that I can use for this > performance testing purpose.
I am not sure what readily available tests can stress this. In the past, I wrote a synthetic workload that spawns a lot of readers in memory.stat in userspace as well as reclaimers to trigger flushing from both the kernel and userspace, with a large number of cgroups. I don't have that lying around unfortunately.

