Re: About Geode rolling downgrade

Anilkumar Gingade Wed, 22 Apr 2020 10:44:26 -0700

>> Rolling downgrade is a pretty important requirement for our customers
>> I'd love to hear what others think about whether this feature is worth
the overhead of making sure downgrades can always work.


I/We haven't seen users/customers requesting rolling downgrade as a
critical requirement for them; most of the time they had both an old and
new setup to upgrade or switch back to an older setup.
Considering the amount of work involved, and code complexity it brings in;
while there are ways to downgrade, it is hard to justify supporting this
feature.

-Anil.





On Tue, Apr 21, 2020 at 2:01 PM Dan Smith <dsm...@pivotal.io> wrote:

> > Anyhow, we wonder what would be as of today the recommended or official
> way to downgrade a Geode system without downtime and data loss?
>
> I think the without downtime option is difficult right now. The most bullet
> proof way to downgrade without data loss is probably just to export/import
> the data, but that involves downtime. In many cases, you could restart the
> system with an old version if you have persistent data because the on disk
> format doesn't change that often, but that won't work in all cases. Or if
> you have multiple redundant WAN sites you could potentially shift traffic
> from one to the other and recreate a WAN site, but that also requires some
> work.
>
> > Rolling downgrade is a pretty important requirement for our customers so
> we would not like to close the discussion here and instead try to see if it
> is still reasonable to propose it for Geode maybe relaxing a bit the
> expectations and clarifying some things.
>
> I agree that rolling downgrade is a useful feature for some cases. I also
> agree we would need to add a lot of tests to make sure we really can
> support it. I'd love to hear what others think about whether this feature
> is worth the overhead of making sure downgrades can always work. As Bruce
> pointed out, we have made changes in the past and we will make changes in
> the future that may need additional logic to support downgrades.
>
> Regarding your downgrade steps, they look reasonable. You might consider
> downgrading the servers first. Rolling *upgrade* upgrades the locators
> first, so up to this point we have only tested a newer locator with an
> older server.
>
> -Dan
>
> On Mon, Apr 20, 2020 at 9:13 AM <alberto.go...@est.tech> wrote:
>
> > Hi,
> >
> > I agree that if we wanted to support limited rolling downgrade some other
> > version interchange needs to be done and extra tests will be required.
> >
> > Nevertheless, this could be done using gfsh or with a startup parameter.
> > For example, in the case you mentioned about the UDP messaging, some
> > command like: "enable UDP messaging" to put the system again in a state
> > equivalent to "upgrade in progress but not yet completed" that would
> allow
> > old members to join again.
> > I guess for each case there would be particularities but they should not
> > involve a lot of effort because most of the mechanisms needed (the ones
> > that allow old and new members to coexist) will have been developed for
> the
> > rolling upgrade.
> >
> > Anyhow, we wonder what would be as of today the recommended or official
> > way to downgrade a Geode system without downtime and data loss?
> >
> >
> > ________________________________
> > From: Bruce Schuchardt <bschucha...@pivotal.io>
> > Sent: Friday, April 17, 2020 11:36 PM
> > To: dev@geode.apache.org <dev@geode.apache.org>
> > Subject: Re: About Geode rolling downgrade
> >
> > Hi Alberto,
> >
> > I think that if we want to support limited rolling downgrade some other
> > version interchange needs to be done and there need to be tests that
> prove
> > that the downgrade works.  That would let us document which versions are
> > compatible for a downgrade and enforce that no-one attempts it between
> > incompatible versions.
> >
> > For instance, there is work going on right now that introduces
> > communications changes to remove UDP messaging.  Once rolling upgrade
> > completes it will shut down unsecure UDP communications.  At that point
> > there is no way to go back.  If you tried it the old servers would try to
> > communicate with UDP but the new servers would not have UDP sockets open
> > for security reasons.
> >
> > As a side note, clients would all have to be rolled back before starting
> > in on the servers.  Clients aren't equipped to talk to an older version
> > server, and servers will reject the client's attempts to create
> connections.
> >
> > On 4/17/20, 10:14 AM, "Alberto Gomez" <alberto.go...@est.tech> wrote:
> >
> >     Hi Bruce,
> >
> >     Thanks a lot for your answer. We had not thought about the changes in
> > distributed algorithms when analyzing rolling downgrades.
> >
> >     Rolling downgrade is a pretty important requirement for our customers
> > so we would not like to close the discussion here and instead try to see
> if
> > it is still reasonable to propose it for Geode maybe relaxing a bit the
> > expectations and clarifying some things.
> >
> >     First, I think supporting rolling downgrade does not mean making it
> > impossible to upgrade distributed algorithms. It means that you need to
> > support the new and the old algorithms (just as it is done today with
> > rolling upgrades) in the upgraded version and also support the
> possibility
> > of switching to the old algorithm in a fully upgraded system.
> >
> >     Second of all, I would say it is not very common to upgrade
> > distributed algorithms, or at least, it does not seem to have been the
> case
> > so far in Geode. Therefore, the burden of adding the logic to support the
> > rolling downgrade would not be something to be carried in every release.
> In
> > my opinion, it will be some extra percentage of work to be added to the
> > work to support the rolling upgrade of the algorithm as the rolling
> > downgrade will probably be using the mechanisms implemented for the
> rolling
> > upgrade.
> >
> >     Third, we do not need to support the rolling downgrade from any
> > release to any other older release. We could just support the rolling
> > downgrade (at least when distributed algorithms are changed) between
> > consecutive versions. They could be considered special cases like those
> > when it is required to provide a tool to convert files in order to assure
> > compatibility.
> >
> >     -Alberto
> >
> >
> >     ________________________________
> >     From: Bruce Schuchardt <bschucha...@pivotal.io>
> >     Sent: Thursday, April 16, 2020 5:04 PM
> >     To: dev@geode.apache.org <dev@geode.apache.org>
> >     Subject: Re: About Geode rolling downgrade
> >
> >     -1
> >
> >     Another reason that we should not support rolling downgrade is that
> it
> > makes it impossible to upgrade distributed algorithms.
> >
> >     When we added rolling upgrade support we pretty much immediately ran
> > into a distributed hang when a test started a Locator using an older
> > version.  In that release we also introduced the cluster configuration
> > service and along with that we needed to upgrade the distributed lock
> > service's notion of the "elder" member of the cluster.  Prior to that
> > change a Locator could not fill this role, but the CCS needed to be able
> to
> > use locking and needed a Locator to be able to fill this role.  During
> > upgrade we used the old "elder" algorithm but once the upgrade was
> finished
> > we switched to the new algorithm.  If you introduced an older Locator
> into
> > this upgraded cluster it wouldn't think that it should be the "elder" but
> > the rest of the cluster would expect it to be the elder.
> >
> >     You could support rolling downgrade in this scenario with extra logic
> > and extra testing, but I don't think that will always be the case.
> Rolling
> > downgrade support would place an immense burden on developers in extra
> > development and testing in order to ensure that older algorithms could
> > always be brought back on-line.
> >
> >     On 4/16/20, 4:24 AM, "Alberto Gomez" <alberto.go...@est.tech>
> wrote:
> >
> >         Hi,
> >
> >         Some months ago I posted a question on this list (see [1]) about
> > the possibility of supporting "rolling downgrade" in Geode in order to
> > downgrade a Geode system to an older version, similar to the "rolling
> > upgrade" currently supported.
> >         With your answers and my investigations my conclusion was that
> the
> > main stumbling block to support "rolling downgrades" was the
> compatibility
> > of persistent files which was very hard to achieve because old members
> > would require to be prepared to support newer versions of persistent
> files.
> >
> >         We have come up with a new approach to support rolling downgrades
> > in Geode which consists of the following procedure:
> >
> >         - For each locator:
> >           - Stop locator
> >           - Remove locator files
> >           - Start locator in older version
> >
> >         - For each server:
> >           - Stop server
> >           - Remove server files
> >           - Revoke missing-disk-stores for server
> >           - Start server in older version
> >
> >         Some extra details about this procedure:
> >         - The starting and stopping of processes may not be able to be
> > done using gfsh as gfsh does not allow to manage members in a different
> > version than its own.
> >         - Redundancy in servers is required
> >         - More than one locator is required
> >         - The allow_old_members_to_join_for_testing needs to be passed to
> > the members.
> >
> >         I would like to ask two questions regarding this procedure:
> >         - Do you see any issue not considered by this procedure or any
> > alternative to it?
> >         - Would it be reasonable to make public the
> > "allow_old_members_to_join_for_testing" parameter (with a new name) so
> that
> > it might be valid option for production systems to support, for example,
> > the procedure proposed?
> >
> >         Thanks in advance for your answers.
> >
> >         Best regards,
> >
> >         -Alberto G.
> >
> >
> >         [1]
> >
> >
> http://mail-archives.apache.org/mod_mbox/geode-dev/201910.mbox/%3cb080e98c-5df4-e494-dcbd-383f6d979...@est.tech%3E
> >
> >
> >
> >
> >
> >
> >
>

Re: About Geode rolling downgrade

Reply via email to