Memcached IntegrationJUnitTest hangs the PR IntegrationTest job because Cache.close() calls GeodeMemcachedService.close() which again calls Cache.close(). Looks like the code base has lots of Cache.close() calls -- all of them could theoretically cause issues. I hate to add ThreadLocal<Boolean> isClosingThread or something like it just to allow reentrant calls to Cache.close().
Mark let the IntegrationTest job run for 7+ hours which shows the hang in the Memcached IntegrationJUnitTest. (Thanks Mark!) On Thu, Apr 16, 2020 at 1:38 PM Kirk Lund <kl...@apache.org> wrote: > It timed out while running OldFreeListOffHeapRegionJUnitTest but I think > the tests before it were responsible for the timeout being exceeded. I > looked through all of the previously run tests and how long each but > without having some sort of database with how long each test takes, it's > impossible to know which test or tests take longer in any given PR. > > The IntegrationTest job that exceeded the timeout: > https://concourse.apachegeode-ci.info/builds/147866 > > The Test Summary for the above IntegrationTest job with Duration for each > test: > http://files.apachegeode-ci.info/builds/apache-develop-pr/geode-pr-4963/test-results/integrationTest/1587061092/ > > Unless we want to start tracking each test class/method and its Duration > in a database, I don't see how we could look for trends or changes to > identify test(s) that suddenly start taking longer. All of the tests take > less than 3 minutes each, so unless one suddenly spikes to 10 minutes or > more, there's really no way to find the test(s). > > On Thu, Apr 16, 2020 at 12:52 PM Owen Nichols <onich...@pivotal.io> wrote: > >> Kirk, most IntegrationTest jobs run in 25-30 minutes, but I did see one < >> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-pr/jobs/IntegrationTestOpenJDK11/builds/7202> >> that came in just under 45 minutes but did succeed. It would be nice to >> know what test is occasionally taking longer and why… >> >> Here’s an example of a previous timeout increase (Note that both the job >> timeout and the callstack timeout should be increased by the same amount): >> https://github.com/apache/geode/pull/4231 >> >> > On Apr 16, 2020, at 10:47 AM, Kirk Lund <kl...@apache.org> wrote: >> > >> > Unfortunately, IntegrationTest exceeds timeout every time I trigger it. >> The >> > cause does not appear to be a specific test or hang. I >> > think IntegrationTest has already been running very close to the timeout >> > and is exceeding it fairly often even without my changes in #4963. >> > >> > Should we increase the timeout for IntegrationTest? (Anyone know how to >> > increase it?) >> >>