Hi Alberto! Another experiment that might be useful to try is changing a p2p message following [1]. If you follow the steps in the wiki, a rolling upgrade should work ok. But if you then try to do a rolling downgrade, what happens?
Anthony [1] https://cwiki.apache.org/confluence/display/GEODE/Managing+Backward+Compatibility > On Sep 26, 2019, at 9:37 AM, Alberto Gomez <alberto.go...@est.tech> wrote: > > Hi again, > > I have been investigating a bit more the possibility of supporting > "rolling downgrades" in Geode similar to rolling upgrades and I would > like to share my findings and also ask for some help. > > My tests were done upgrading from Geode 1.10 to a recent version in the > develop branch and rolling back (downgrading) to 1.10. I was using one > locator and two servers. I am sure my findings would have been different > if I used other Geode versions or another configuration. > > By doing some changes in code, I managed to rollback the servers but I > got into trouble when starting the old locator. > > The changes I did where the following: > > - I removed the check for equality for the local and remote versions of > Geode in ConnectCommand::connect() so that it was allowed to connect to > Geode with a newer or older version of gfsh. > - I started the locators and servers with the > gemfire.allow_old_members_to_join_for_testing property to allow old > members to join a newer Geode system. > - I changed Version::fromOrdinal method to return CURRENT instead of > throwing an exception when the ordinal passed corresponds to a version > not supported. I had to do this change in order for old servers to be > able to progress when reading oplogs generated by newer servers. > > After downgrading the servers successfully, I stopped the new locator, > started the old one (with the old gfsh) and got an exception in the > locator when reading from the view file: > > The Locator process terminated unexpectedly with exit status 1. Please > refer to the log file in > /home/alberto/geode/geode-releases/apache-geode-1.0.0/locator1 for full > details. > > Exception in thread "main" org.apache.geode.InternalGemFireException: > Unable to recover previous membership view from > /home/alberto/geode/geode-releases/apache-geode-1.10.0/locator1/locator10334view.dat > > at > org.apache.geode.distributed.internal.membership.gms.locator.GMSLocator.recoverFromFile(GMSLocator.java:492) > ... > Caused by: java.io.StreamCorruptedException: invalid type code: 02 > > at > java.io.ObjectInputStream$BlockDataInputStream.readBlockHeader(ObjectInputStream.java:2871) > ... > > I think the problem is in the deserialization due to the fact that the > format of the locator's view file has changed between both Geode > versions after GEODE-7090. > > This leads me to think that I might have been successful in the "rolling > downgrade" if I had selected other versions of Geode or I might have run > into a different set of problems. > > After this research I would like to get some feedback from the community > on the following questions: > > - Would it be reasonable to restrict future changes in Geode between > minor versions so that the rolling downgrade is supported? This would > imply that changes such as the one done in GEODE-7090 would not be > allowed for a minor version change. > > - Could the changes in code and configuration I have done in my tests to > support the "rolling downgrade" have any negative secondary effects > which should dissuade us from using them? > > - Are there any other things I have not taken into account that would > require changes in order to support rolling upgrades? > > - Is it even feasible to implement "rolling downgrades" of Geode with > some restrictions or there are always possible incompatibilities between > versions that make it impossible or unreasonably hard to support this > kind of feature? > > Thanks in advance for your help, > > -Alberto G. > > On 23/9/19 17:04, Alberto Gomez wrote: >> Hi Anthony, >> >> That's an option but, as you say, the cost in infrastructure is high and >> there are also other problems to solve like how to do the switch between >> systems and how to assure the data consistency among them. >> >> I was thinking that in many cases it might be possible to support a >> rolling downgrade similar to the rolling upgrade given that the rolling >> upgrade already allows the coexistence of old and new members in a cluster. >> >> -Alberto >> >> On 23/9/19 15:55, Anthony Baker wrote: >>> Have you considered using a blue / green deployment approach? It provides >>> more flexibility for these scenarios though the infrastructure cost is high. >>> >>> Anthony >>> >>> >>>> On Sep 23, 2019, at 5:59 AM, Alberto Gomez <alberto.go...@est.tech> wrote: >>>> >>>> Hi, >>>> >>>> Looking at the Geode documentation I have not found any reference to >>>> rolling back a Geode upgrade. >>>> >>>> Running some tests, I have observed that once a Geode System has been >>>> upgraded to a later version, it is not possible to rollback the upgrade >>>> even if no data modifications have been done after the upgrade. >>>> >>>> The system protects itself in several places: gfsh does not allow you to >>>> connect to a newer version of Geode, the Oplog files store the version >>>> of the server which prevents an older server to start from a file from a >>>> newer server, the cluster also does not allow older members to join a >>>> cluster with newer members and there are probably other protections I >>>> did not hit. >>>> >>>> Even if you tamper with some of those protections, you can run into >>>> trouble due to compatibility issues. I ran into one when I lifted up the >>>> requirement to have the same gfsh versions using versions 1.8 and 1.10 >>>> because it seems there is some configuration exchanged in Json format >>>> whose format has changed between those two versions. >>>> >>>> My question is that if it has ever been considered to support rollback >>>> of Geode upgrades (preferably in rolling mode), at least between systems >>>> under the same major version. In our experience customers often require >>>> the rollback of upgrades. >>>> >>>> Thanks in advance for your help, >>>> >>>> -Alberto G. >>>>