Hi Ernie, I have seen this problem in the support/1.13 branch and also on develop.
Interestingly, the patch I sent is applied seamlessly in my local repo set to the develop branch. The patch modifies the RollingUpgradeRollServersOnPartitionedRegion_dataserializable test case by running "list members" on an upgraded system is RollingUpgradeRollServersOnPartitionedRegion_dataserializable. I run it manually with the following command: ./gradlew geode-core:upgradeTest --tests=RollingUpgradeRollServersOnPartitionedRegion_dataserializable I see it failing when upgrading from 1.12. I created a draft PR where you can see also the changes in the test case that manifest the problem. See: https://github.com/apache/geode/pull/5224 Please, let me know if you need any more information. BR, Alberto ________________________________ From: Ernie Burghardt <burghar...@vmware.com> Sent: Monday, June 8, 2020 9:04 PM To: dev@geode.apache.org <dev@geode.apache.org> Subject: Re: Problem in rolling upgrade since 1.12 Hi Alberto, I’m looking at this, came across a couple blockers… Do you have branch that exhibits this problem? Draft PR maybe? I tried to apply you patch to latest develop, but the patch doesn’t pass git apply’s check…. Also these tests pass on develop, would you be able to check against the latest and update the diff? I’m very interested in reproducing the issue you have observed. Thanks, Ernie From: Alberto Gomez <alberto.go...@est.tech> Reply-To: "dev@geode.apache.org" <dev@geode.apache.org> Date: Monday, June 8, 2020 at 12:31 AM To: "dev@geode.apache.org" <dev@geode.apache.org> Subject: Re: Problem in rolling upgrade since 1.12 Hi, I attach a diff for the modified test case in case you would like to use it to check the problem I mentioned. BR, Alberto ________________________________ From: Alberto Gomez <alberto.go...@est.tech> Sent: Saturday, June 6, 2020 4:06 PM To: dev@geode.apache.org <dev@geode.apache.org> Subject: Problem in rolling upgrade since 1.12 Hi, I have observed that since version 1.12 rolling upgrades to future versions leave the first upgraded locator "as if" it was still on version 1.12. This is the output from "list members" before starting the upgrade from version 1.12: Name | Id ---- | ----------------------------------------------------------- vm2 | 192.168.0.37(vm2:29367:locator)<ec><v0>:41001 [Coordinator] vm0 | 192.168.0.37(vm0:29260)<v1>:41002 vm1 | 192.168.0.37(vm1:29296)<v2>:41003 And this is the output from "list members" after upgrading the first locator from 1.12 to 1.13/1.14: Name | Id ---- | -------------------------------------------------------------------------------- vm2 | 192.168.0.37(vm2:1453:locator)<ec><v8>:41001(version:GEODE 1.12.0) [Coordinator] vm0 | 192.168.0.37(vm0:810)<v1>:41002(version:GEODE 1.12.0) vm1 | 192.168.0.37(vm1:849)<v2>:41003(version:GEODE 1.12.0) Finally this is the output in gfsh once the rolling upgrade has been completed (locators and servers upgraded): Name | Id ---- | -------------------------------------------------------------------------------- vm2 | 192.168.0.37(vm2:1453:locator)<ec><v8>:41001(version:GEODE 1.12.0) [Coordinator] vm0 | 192.168.0.37(vm0:2457)<v23>:41002 vm1 | 192.168.0.37(vm1:2576)<v25>:41003 I verified this by running manual tests and also by running the following upgrade test (had to stop it in the middle to connect via gfsh and get the gfsh outputs): RollingUpgradeRollServersOnPartitionedRegion_dataserializable.testRollServersOnPartitionedRegion_dataserializable After the rolling upgrade, the shutdown command fails with the following error: Member 192.168.0.37(vm2:1453:locator)<ec><v8>:41001 could not be found. Please verify the member name or ID and try again. The only way I have found to come out of the situation is by restarting the locator. Once restarted again, the output of gfsh shows that all members are upgraded to the new version, i.e. the locator does not show anymore that it is on version GEODE 1.12.0. Anybody has any clue why this is happening? Thanks in advance, /Alberto G.