June 1, 2026 at 12:54 AM, "SeongJae Park" <[email protected] mailto:[email protected]?to=%22SeongJae%20Park%22%20%3Csj%40kernel.org%3E > wrote:
> > On Sun, 31 May 2026 17:17:23 +0800 Kunwu Chan <[email protected]> wrote: > > > > > From: Kunwu Chan <[email protected]> > > > > The test aborts if the initial aggregation cycles produce zero > > tried regions. This can happen on slow machines, causing false > > failures. Skip empty cycles and retry up to 200 times before > > giving up. Also check that enough samples were collected before > > computing the 50th percentile. > > > I agree this will make the test be more reliable. I'm bit concerned if 200 > times retry can make the test run too long, though. > > Also, could you further elaborate why this can fail on slow machines? That is, > DAMON will check the access of 'access_memory_even' process every 5ms. Are you > thinking the 5ms is too short for 'access_memory_event' to make the expected > access (accessing the 7 regins of 10 MiB size) within? If so, should we > increase the sampling interval before retrying? > > I also suspect if the unreliable results you seen is due to the fact that > DAMON > is not flushing TLB, like we discussed before. If that's the case, could we > increase the working set size of this test, similar to the wss_estimation > test? > Thanks, SJ. Good points. I don't yet have enough evidence to say whether this is primarily due to scheduling delays, a too-short sampling interval, or effects from not flushing TLB. I'll investigate the root cause and see if increasing the working set size or adjusting the test configuration may be a cleaner solution than adding retries. I'll drop this patch from v2 for now and revisit it once I better understand the root cause. Thanks, Kunwu > [1] https://lore.kernel.org/[email protected] > > Thanks, > SJ > > [...] >

