Hi again, I have been investigating a bit more the possibility of supporting "rolling downgrades" in Geode similar to rolling upgrades and I would like to share my findings and also ask for some help.
My tests were done upgrading from Geode 1.10 to a recent version in the develop branch and rolling back (downgrading) to 1.10. I was using one locator and two servers. I am sure my findings would have been different if I used other Geode versions or another configuration. By doing some changes in code, I managed to rollback the servers but I got into trouble when starting the old locator. The changes I did where the following: - I removed the check for equality for the local and remote versions of Geode in ConnectCommand::connect() so that it was allowed to connect to Geode with a newer or older version of gfsh. - I started the locators and servers with the gemfire.allow_old_members_to_join_for_testing property to allow old members to join a newer Geode system. - I changed Version::fromOrdinal method to return CURRENT instead of throwing an exception when the ordinal passed corresponds to a version not supported. I had to do this change in order for old servers to be able to progress when reading oplogs generated by newer servers. After downgrading the servers successfully, I stopped the new locator, started the old one (with the old gfsh) and got an exception in the locator when reading from the view file: The Locator process terminated unexpectedly with exit status 1. Please refer to the log file in /home/alberto/geode/geode-releases/apache-geode-1.0.0/locator1 for full details. Exception in thread "main" org.apache.geode.InternalGemFireException: Unable to recover previous membership view from /home/alberto/geode/geode-releases/apache-geode-1.10.0/locator1/locator10334view.dat at org.apache.geode.distributed.internal.membership.gms.locator.GMSLocator.recoverFromFile(GMSLocator.java:492) ... Caused by: java.io.StreamCorruptedException: invalid type code: 02 at java.io.ObjectInputStream$BlockDataInputStream.readBlockHeader(ObjectInputStream.java:2871) ... I think the problem is in the deserialization due to the fact that the format of the locator's view file has changed between both Geode versions after GEODE-7090. This leads me to think that I might have been successful in the "rolling downgrade" if I had selected other versions of Geode or I might have run into a different set of problems. After this research I would like to get some feedback from the community on the following questions: - Would it be reasonable to restrict future changes in Geode between minor versions so that the rolling downgrade is supported? This would imply that changes such as the one done in GEODE-7090 would not be allowed for a minor version change. - Could the changes in code and configuration I have done in my tests to support the "rolling downgrade" have any negative secondary effects which should dissuade us from using them? - Are there any other things I have not taken into account that would require changes in order to support rolling upgrades? - Is it even feasible to implement "rolling downgrades" of Geode with some restrictions or there are always possible incompatibilities between versions that make it impossible or unreasonably hard to support this kind of feature? Thanks in advance for your help, -Alberto G. On 23/9/19 17:04, Alberto Gomez wrote: > Hi Anthony, > > That's an option but, as you say, the cost in infrastructure is high and > there are also other problems to solve like how to do the switch between > systems and how to assure the data consistency among them. > > I was thinking that in many cases it might be possible to support a > rolling downgrade similar to the rolling upgrade given that the rolling > upgrade already allows the coexistence of old and new members in a cluster. > > -Alberto > > On 23/9/19 15:55, Anthony Baker wrote: >> Have you considered using a blue / green deployment approach? It provides >> more flexibility for these scenarios though the infrastructure cost is high. >> >> Anthony >> >> >>> On Sep 23, 2019, at 5:59 AM, Alberto Gomez <alberto.go...@est.tech> wrote: >>> >>> Hi, >>> >>> Looking at the Geode documentation I have not found any reference to >>> rolling back a Geode upgrade. >>> >>> Running some tests, I have observed that once a Geode System has been >>> upgraded to a later version, it is not possible to rollback the upgrade >>> even if no data modifications have been done after the upgrade. >>> >>> The system protects itself in several places: gfsh does not allow you to >>> connect to a newer version of Geode, the Oplog files store the version >>> of the server which prevents an older server to start from a file from a >>> newer server, the cluster also does not allow older members to join a >>> cluster with newer members and there are probably other protections I >>> did not hit. >>> >>> Even if you tamper with some of those protections, you can run into >>> trouble due to compatibility issues. I ran into one when I lifted up the >>> requirement to have the same gfsh versions using versions 1.8 and 1.10 >>> because it seems there is some configuration exchanged in Json format >>> whose format has changed between those two versions. >>> >>> My question is that if it has ever been considered to support rollback >>> of Geode upgrades (preferably in rolling mode), at least between systems >>> under the same major version. In our experience customers often require >>> the rollback of upgrades. >>> >>> Thanks in advance for your help, >>> >>> -Alberto G. >>>