June 1, 2026 at 12:54 AM, "SeongJae Park" <[email protected] 
mailto:[email protected]?to=%22SeongJae%20Park%22%20%3Csj%40kernel.org%3E > wrote:


> 
> On Sun, 31 May 2026 17:17:23 +0800 Kunwu Chan <[email protected]> wrote:
> 
> > 
> > From: Kunwu Chan <[email protected]>
> >  
> >  The test aborts if the initial aggregation cycles produce zero
> >  tried regions. This can happen on slow machines, causing false
> >  failures. Skip empty cycles and retry up to 200 times before
> >  giving up. Also check that enough samples were collected before
> >  computing the 50th percentile.
> > 
> I agree this will make the test be more reliable. I'm bit concerned if 200
> times retry can make the test run too long, though.
> 
> Also, could you further elaborate why this can fail on slow machines? That is,
> DAMON will check the access of 'access_memory_even' process every 5ms. Are you
> thinking the 5ms is too short for 'access_memory_event' to make the expected
> access (accessing the 7 regins of 10 MiB size) within? If so, should we
> increase the sampling interval before retrying?
> 
> I also suspect if the unreliable results you seen is due to the fact that 
> DAMON
> is not flushing TLB, like we discussed before. If that's the case, could we
> increase the working set size of this test, similar to the wss_estimation 
> test?
> 
Thanks, SJ.

Good points.

I don't yet have enough evidence to say whether this is primarily due to
scheduling delays, a too-short sampling interval, or effects from not
flushing TLB.

I'll investigate the root cause and see if increasing the working set
size or adjusting the test configuration may be a cleaner solution than
adding retries.

I'll drop this patch from v2 for now and revisit it once I better
understand the root cause.

Thanks,
Kunwu

> [1] https://lore.kernel.org/[email protected]
> 
> Thanks,
> SJ
> 
> [...]
>

Reply via email to