[PROPOSAL]: BackPort GEODE-8029 to support/1.12 and support/1.13
Hello devs, I'd like to propose bringing GEODE-8029 [1] to the *support/1.12* and *support/1.13* branches. The fix has been merged into develop through commit bc0090dc93643fd4d09c79a4b0c29d883172b546 [2], and it's basically to make sure we delete unused drfs upon initialization to prevent the proliferation of unused records and files within the file system, which could cause members to fail during startup while recovering disk-stores. Best regards. [1]: https://issues.apache.org/jira/browse/GEODE-8029 [2:] https://github.com/apache/geode/commit/bc0090dc93643fd4d09c79a4b0c29d883172b546 -- Ju@N
Commit revert etiquette
Hello friends, Just recently I’ve noticed a couple of PRs submitted that were intended to revert flaky or suspected commits. Initially, these PRs were simply marked as ‘Revert’ without any further detail. I’m not sure if there was any additional out-of-band communication but I would like to request that that if you’re submitting a reversion you do the courtesy of providing additional details as to why the commit is being reverted. That way it is also obvious to others, who may not be aware of additional communication, what the reason is. Thank you. --Jens
Re: [PROPOSAL]: BackPort GEODE-8029 to support/1.12 and support/1.13
+1 On 6/10/20, 3:18 AM, "Ju@N" wrote: Hello devs, I'd like to propose bringing GEODE-8029 to the *support/1.12* and *support/1.13* branches. The fix has been merged into develop through commit bc0090dc93643fd4d09c79a4b0c29d883172b546, and it's basically to make sure we delete unused drfs upon initialization to prevent the proliferation of unused records and files within the file system, which could cause members to fail during startup while recovering disk-stores. Best regards. -- Ju@N
Re: [PROPOSAL]: BackPort GEODE-8029 to support/1.12 and support/1.13
+1 On Jun 10, 2020, 3:18 AM -0700, Ju@N , wrote: Hello devs, I'd like to propose bringing GEODE-8029 [1] to the *support/1.12* and *support/1.13* branches. The fix has been merged into develop through commit bc0090dc93643fd4d09c79a4b0c29d883172b546 [2], and it's basically to make sure we delete unused drfs upon initialization to prevent the proliferation of unused records and files within the file system, which could cause members to fail during startup while recovering disk-stores. Best regards. [1]: https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-8029&data=02%7C01%7Cudo%40vmware.com%7Cf1fe760bbe494bd991e508d80d27a97f%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637273811297244101&sdata=rM4oeJQET%2FZAVp3O%2BqCLAz0dNe%2BvvStRGhea%2FutjIm0%3D&reserved=0 [2:] https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode%2Fcommit%2Fbc0090dc93643fd4d09c79a4b0c29d883172b546&data=02%7C01%7Cudo%40vmware.com%7Cf1fe760bbe494bd991e508d80d27a97f%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637273811297244101&sdata=h94q99tJgHmuIr0MIcDMNahdqEy3KesVjfdEVD6Ei64%3D&reserved=0 -- Ju@N
RE: [PROPOSAL]: BackPort GEODE-8029 to support/1.12 and support/1.13
+1 -Original Message- From: Udo Kohlmeyer Sent: Wednesday, June 10, 2020 9:14 AM To: dev@geode.apache.org Subject: Re: [PROPOSAL]: BackPort GEODE-8029 to support/1.12 and support/1.13 +1 On Jun 10, 2020, 3:18 AM -0700, Ju@N , wrote: Hello devs, I'd like to propose bringing GEODE-8029 [1] to the *support/1.12* and *support/1.13* branches. The fix has been merged into develop through commit bc0090dc93643fd4d09c79a4b0c29d883172b546 [2], and it's basically to make sure we delete unused drfs upon initialization to prevent the proliferation of unused records and files within the file system, which could cause members to fail during startup while recovering disk-stores. Best regards. [1]: https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-8029&data=02%7C01%7Cdickc%40vmware.com%7Cac95fdc8b67c45d3931a08d80d5a2434%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637274028089007755&sdata=zM3nWf6uqoQFBkmV6vmnH3Fr4hFhGh3blSxCmH%2ByCwU%3D&reserved=0 [2:] https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode%2Fcommit%2Fbc0090dc93643fd4d09c79a4b0c29d883172b546&data=02%7C01%7Cdickc%40vmware.com%7Cac95fdc8b67c45d3931a08d80d5a2434%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637274028089007755&sdata=wT5C1SM7LpCu7pIo8xjwUM9VtFN7DHuxTeob20r61BE%3D&reserved=0 -- Ju@N
Re: [PROPOSAL]: BackPort GEODE-8029 to support/1.12 and support/1.13
+1 From: Ju@N Sent: June 10, 2020 6:18 To: dev@geode.apache.org Subject: [PROPOSAL]: BackPort GEODE-8029 to support/1.12 and support/1.13 Hello devs, I'd like to propose bringing GEODE-8029 [1] to the *support/1.12* and *support/1.13* branches. The fix has been merged into develop through commit bc0090dc93643fd4d09c79a4b0c29d883172b546 [2], and it's basically to make sure we delete unused drfs upon initialization to prevent the proliferation of unused records and files within the file system, which could cause members to fail during startup while recovering disk-stores. Best regards. [1]: https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-8029&data=02%7C01%7Cjmelchior%40vmware.com%7C96432ff1699a4d2ed2a508d80d27a970%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637273811288277601&sdata=kw1jFlsIIdlhh%2FTxW%2BcZD2BVmfQzGSCgoW1xyRdKE4E%3D&reserved=0 [2:] https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode%2Fcommit%2Fbc0090dc93643fd4d09c79a4b0c29d883172b546&data=02%7C01%7Cjmelchior%40vmware.com%7C96432ff1699a4d2ed2a508d80d27a970%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637273811288277601&sdata=R412kMw3EXKl6%2FOgKP3OEZDuJJs%2F3uRIT0AZljdlpDo%3D&reserved=0 -- Ju@N
Re: [PROPOSAL]: BackPort GEODE-8029 to support/1.12 and support/1.13
Looks like the community approves, Ju@n. Go ahead and merge. Thanks, Dave On Wed, Jun 10, 2020 at 10:37 AM Joris Melchior wrote: > +1 > > From: Ju@N > Sent: June 10, 2020 6:18 > To: dev@geode.apache.org > Subject: [PROPOSAL]: BackPort GEODE-8029 to support/1.12 and support/1.13 > > Hello devs, > > I'd like to propose bringing GEODE-8029 [1] to the *support/1.12* and > *support/1.13* branches. > The fix has been merged into develop through commit > bc0090dc93643fd4d09c79a4b0c29d883172b546 [2], and it's basically to make > sure we delete unused drfs upon initialization to prevent the proliferation > of unused records and files within the file system, which could cause > members to fail during startup while recovering disk-stores. > Best regards. > > [1]: > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-8029&data=02%7C01%7Cjmelchior%40vmware.com%7C96432ff1699a4d2ed2a508d80d27a970%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637273811288277601&sdata=kw1jFlsIIdlhh%2FTxW%2BcZD2BVmfQzGSCgoW1xyRdKE4E%3D&reserved=0 > [2:] > > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode%2Fcommit%2Fbc0090dc93643fd4d09c79a4b0c29d883172b546&data=02%7C01%7Cjmelchior%40vmware.com%7C96432ff1699a4d2ed2a508d80d27a970%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637273811288277601&sdata=R412kMw3EXKl6%2FOgKP3OEZDuJJs%2F3uRIT0AZljdlpDo%3D&reserved=0 > > -- > Ju@N >
Re: [PROPOSAL]: BackPort GEODE-8029 to support/1.12 and support/1.13
Thanks everyone!. I've cherry picked the commit into branches support/1.12[1] and support/1.13[2]. Best regards. [1]: https://github.com/apache/geode/commit/bdeff9d6144c47ea687cf3b7d25f12598228 [2]: https://github.com/apache/geode/commit/7c1ffd528ff72b920dd606604de2a744c1728b23 On Wed, 10 Jun 2020 at 18:51, Dave Barnes wrote: > Looks like the community approves, Ju@n. Go ahead and merge. > Thanks, > Dave > > On Wed, Jun 10, 2020 at 10:37 AM Joris Melchior > wrote: > > > +1 > > > > From: Ju@N > > Sent: June 10, 2020 6:18 > > To: dev@geode.apache.org > > Subject: [PROPOSAL]: BackPort GEODE-8029 to support/1.12 and support/1.13 > > > > Hello devs, > > > > I'd like to propose bringing GEODE-8029 [1] to the *support/1.12* and > > *support/1.13* branches. > > The fix has been merged into develop through commit > > bc0090dc93643fd4d09c79a4b0c29d883172b546 [2], and it's basically to make > > sure we delete unused drfs upon initialization to prevent the > proliferation > > of unused records and files within the file system, which could cause > > members to fail during startup while recovering disk-stores. > > Best regards. > > > > [1]: > > > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-8029&data=02%7C01%7Cjmelchior%40vmware.com%7C96432ff1699a4d2ed2a508d80d27a970%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637273811288277601&sdata=kw1jFlsIIdlhh%2FTxW%2BcZD2BVmfQzGSCgoW1xyRdKE4E%3D&reserved=0 > > [2:] > > > > > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode%2Fcommit%2Fbc0090dc93643fd4d09c79a4b0c29d883172b546&data=02%7C01%7Cjmelchior%40vmware.com%7C96432ff1699a4d2ed2a508d80d27a970%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637273811288277601&sdata=R412kMw3EXKl6%2FOgKP3OEZDuJJs%2F3uRIT0AZljdlpDo%3D&reserved=0 > > > > -- > > Ju@N > > > -- Ju@N
Re: Problem in rolling upgrade since 1.12
Ernie made us a ticket for this issue: https://issues.apache.org/jira/browse/GEODE-8240 On Mon, Jun 8, 2020 at 12:59 PM Alberto Gomez wrote: > Hi Ernie, > > I have seen this problem in the support/1.13 branch and also on develop. > > Interestingly, the patch I sent is applied seamlessly in my local repo set > to the develop branch. > > The patch modifies the > RollingUpgradeRollServersOnPartitionedRegion_dataserializable test case by > running "list members" on an upgraded system is > RollingUpgradeRollServersOnPartitionedRegion_dataserializable. I run it > manually with the following command: > > ./gradlew geode-core:upgradeTest > --tests=RollingUpgradeRollServersOnPartitionedRegion_dataserializable > > I see it failing when upgrading from 1.12. > > I created a draft PR where you can see also the changes in the test case > that manifest the problem. > > See: https://github.com/apache/geode/pull/5224 > > > Please, let me know if you need any more information. > > BR, > > Alberto > > From: Ernie Burghardt > Sent: Monday, June 8, 2020 9:04 PM > To: dev@geode.apache.org > Subject: Re: Problem in rolling upgrade since 1.12 > > Hi Alberto, > > I’m looking at this, came across a couple blockers… > Do you have branch that exhibits this problem? Draft PR maybe? > I tried to apply you patch to latest develop, but the patch doesn’t pass > git apply’s check…. > Also these tests pass on develop, would you be able to check against the > latest and update the diff? > I’m very interested in reproducing the issue you have observed. > > Thanks, > Ernie > > From: Alberto Gomez > Reply-To: "dev@geode.apache.org" > Date: Monday, June 8, 2020 at 12:31 AM > To: "dev@geode.apache.org" > Subject: Re: Problem in rolling upgrade since 1.12 > > Hi, > > I attach a diff for the modified test case in case you would like to use > it to check the problem I mentioned. > > BR, > > Alberto > > From: Alberto Gomez > Sent: Saturday, June 6, 2020 4:06 PM > To: dev@geode.apache.org > Subject: Problem in rolling upgrade since 1.12 > > Hi, > > I have observed that since version 1.12 rolling upgrades to future > versions leave the first upgraded locator "as if" it was still on version > 1.12. > > This is the output from "list members" before starting the upgrade from > version 1.12: > > Name | Id > | --- > vm2 | 192.168.0.37(vm2:29367:locator):41001 [Coordinator] > vm0 | 192.168.0.37(vm0:29260):41002 > vm1 | 192.168.0.37(vm1:29296):41003 > > > And this is the output from "list members" after upgrading the first > locator from 1.12 to 1.13/1.14: > > Name | Id > | > > vm2 | 192.168.0.37(vm2:1453:locator):41001(version:GEODE 1.12.0) > [Coordinator] > vm0 | 192.168.0.37(vm0:810):41002(version:GEODE 1.12.0) > vm1 | 192.168.0.37(vm1:849):41003(version:GEODE 1.12.0) > > > Finally this is the output in gfsh once the rolling upgrade has been > completed (locators and servers upgraded): > > Name | Id > | > > vm2 | 192.168.0.37(vm2:1453:locator):41001(version:GEODE 1.12.0) > [Coordinator] > vm0 | 192.168.0.37(vm0:2457):41002 > vm1 | 192.168.0.37(vm1:2576):41003 > > I verified this by running manual tests and also by running the following > upgrade test (had to stop it in the middle to connect via gfsh and get the > gfsh outputs): > > RollingUpgradeRollServersOnPartitionedRegion_dataserializable.testRollServersOnPartitionedRegion_dataserializable > > After the rolling upgrade, the shutdown command fails with the following > error: > Member 192.168.0.37(vm2:1453:locator):41001 could not be found. > Please verify the member name or ID and try again. > > The only way I have found to come out of the situation is by restarting > the locator. > Once restarted again, the output of gfsh shows that all members are > upgraded to the new version, i.e. the locator does not show anymore that it > is on version GEODE 1.12.0. > > Anybody has any clue why this is happening? > > Thanks in advance, > > /Alberto G. >
[INFO] Distributed Test runs in bulk results.
Hello All, I have been doing bulk test runs of DistributedTestOpenJDK8, in this case over 200. Here is a simplified report to kind of help you see what I am seeing and I think everybody sees with random failures as part of the PR process. It is very easy to cause failures like this by not knowing what is running asynchronous and Geode is a complex system or introducing timing constraints that may not hold up in the system e.g. waiting 5 seconds for a test result that could take longer unbeknownst you. All of that said, here are the results. There are tickets already open for most if not all of these issues. Please let me know how often you all would like to see these reports… Thanks, Mark *** Overall build success rate: 84.0% The following test methods see failures in more than one class. There may be a failing *TestBase class *.testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived: 18 failures : ParallelWANPersistenceEnabledGatewaySenderDUnitTest: 7 failures (96.889% success rate) ParallelWANPersistenceEnabledGatewaySenderOffHeapDUnitTest: 11 failures (95.111% success rate) *.testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived: 4 failures : SerialWANPersistenceEnabledGatewaySenderOffHeapDUnitTest: 3 failures (98.667% success rate) SerialWANPersistenceEnabledGatewaySenderDUnitTest: 1 failures (99.556% success rate) *** org.apache.geode.management.MemberMXBeanDistributedTest: 3 failures (98.667% success rate) testBucketCount https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3247 testBucketCount https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3241 testBucketCount https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3199 org.apache.geode.internal.cache.wan.parallel.ParallelWANPersistenceEnabledGatewaySenderDUnitTest: 7 failures (96.889% success rate) testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3335 testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3331 testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3294 testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3285 testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3218 testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3180 testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3156 org.apache.geode.internal.cache.partitioned.PersistentPartitionedRegionDistributedTest: 1 failures (99.556% success rate) testCacheCloseDuringBucketMoveDoesntCauseDataLoss https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3267 org.apache.geode.cache.management.MemoryThresholdsOffHeapDUnitTest: 1 failures (99.556% success rate) testDistributedRegionClientPutRejection https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3319 org.apache.geode.internal.cache.wan.offheap.SerialWANPersistenceEnabledGatewaySenderOffHeapDUnitTest: 3 failures (98.667% success rate) testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3239 testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived https://concourse.apachegeode-ci.info/teams