Hi again,

I have been investigating a bit more the possibility of supporting 
"rolling downgrades" in Geode similar to rolling upgrades and I would 
like to share my findings and also ask for some help.

My tests were done upgrading from Geode 1.10 to a recent version in the 
develop branch and rolling back (downgrading) to 1.10. I was using one 
locator and two servers. I am sure my findings would have been different 
if I used other Geode versions or another configuration.

By doing some changes in code, I managed to rollback the servers but I 
got into trouble when starting the old locator.

The changes I did where the following:

- I removed the check for equality for the local and remote versions of 
Geode in ConnectCommand::connect() so that it was allowed to connect to 
Geode with a newer or older version of gfsh.
- I started the locators and servers with the 
gemfire.allow_old_members_to_join_for_testing property to allow old 
members to join a newer Geode system.
- I changed Version::fromOrdinal method to return CURRENT instead of 
throwing an exception when the ordinal passed corresponds to a version 
not supported. I had to do this change in order for old servers to be 
able to progress when reading oplogs generated by newer servers.

After downgrading the servers successfully, I stopped the new locator, 
started the old one (with the old gfsh) and got an exception in the 
locator when reading from the view file:

The Locator process terminated unexpectedly with exit status 1. Please 
refer to the log file in 
/home/alberto/geode/geode-releases/apache-geode-1.0.0/locator1 for full 
details.

Exception in thread "main" org.apache.geode.InternalGemFireException: 
Unable to recover previous membership view from 
/home/alberto/geode/geode-releases/apache-geode-1.10.0/locator1/locator10334view.dat

     at 
org.apache.geode.distributed.internal.membership.gms.locator.GMSLocator.recoverFromFile(GMSLocator.java:492)
     ...
     Caused by: java.io.StreamCorruptedException: invalid type code: 02

     at 
java.io.ObjectInputStream$BlockDataInputStream.readBlockHeader(ObjectInputStream.java:2871)
     ...

I think the problem is in the deserialization due to the fact that the 
format of the locator's view file has changed between both Geode 
versions after GEODE-7090.

This leads me to think that I might have been successful in the "rolling 
downgrade" if I had selected other versions of Geode or I might have run 
into a different set of problems.

After this research I would like to get some feedback from the community 
on the following questions:

- Would it be reasonable to restrict future changes in Geode between 
minor versions so that the rolling downgrade is supported? This would 
imply that changes such as the one done in GEODE-7090 would not be 
allowed for a minor version change.

- Could the changes in code and configuration I have done in my tests to 
support the "rolling downgrade" have any negative secondary effects 
which should dissuade us from using them?

- Are there any other things I have not taken into account that would 
require changes in order to support rolling upgrades?

- Is it even feasible to implement "rolling downgrades" of Geode with 
some restrictions or there are always possible incompatibilities between 
versions that make it impossible or unreasonably hard to support this 
kind of feature?

Thanks in advance for your help,

-Alberto G.

On 23/9/19 17:04, Alberto Gomez wrote:
> Hi Anthony,
>
> That's an option but, as you say, the cost in infrastructure is high and
> there are also other problems to solve like how to do the switch between
> systems and how to assure the data consistency among them.
>
> I was thinking that in many cases it might be possible to support a
> rolling downgrade similar to the rolling upgrade given that the rolling
> upgrade already allows the coexistence of old and new members in a cluster.
>
> -Alberto
>
> On 23/9/19 15:55, Anthony Baker wrote:
>> Have you considered using a blue / green deployment approach?  It provides 
>> more flexibility for these scenarios though the infrastructure cost is high.
>>
>> Anthony
>>
>>
>>> On Sep 23, 2019, at 5:59 AM, Alberto Gomez <alberto.go...@est.tech> wrote:
>>>
>>> Hi,
>>>
>>> Looking at the Geode documentation I have not found any reference to
>>> rolling back a Geode upgrade.
>>>
>>> Running some tests, I have observed that once a Geode System has been
>>> upgraded to a later version, it is not possible to rollback the upgrade
>>> even if no data modifications have been done after the upgrade.
>>>
>>> The system protects itself in several places: gfsh does not allow you to
>>> connect to a newer version of Geode, the Oplog files store the version
>>> of the server which prevents an older server to start from a file from a
>>> newer server, the cluster also does not allow older members to join a
>>> cluster with newer members and there are probably other protections I
>>> did not hit.
>>>
>>> Even if you tamper with some of those protections, you can run into
>>> trouble due to compatibility issues. I ran into one when I lifted up the
>>> requirement to have the same gfsh versions using versions 1.8 and 1.10
>>> because it seems there is some configuration exchanged in Json format
>>> whose format has changed between those two versions.
>>>
>>> My question is that if it has ever been considered to support rollback
>>> of Geode upgrades (preferably in rolling mode), at least between systems
>>> under the same major version. In our experience customers often require
>>> the rollback of upgrades.
>>>
>>> Thanks in advance for your help,
>>>
>>> -Alberto G.
>>>

Reply via email to