Hi, I friendly reminder to the community about this request for feedback.
Thanks, -Alberto G. ________________________________ From: Alberto Gomez <alberto.go...@est.tech> Sent: Thursday, May 7, 2020 10:44 AM To: geode <dev@geode.apache.org> Subject: Re: About Geode rolling downgrade Hi again, Considering Geode does not support online rollback for the time being and since we have the need to rollback even a standalone system, we were thinking on a procedure to downgrade Geode cluster tolerating downtime, but without a need to: * spin another cluster to sync from, * do a restore or * import data snapshot. The procedure we came up with is: 1. First step - downgrade locators: * While still on the newer version, export cluster configuration. * Shutdown all locators. Existing clients will continue using their server connections. New clients/connections are not possible. * Start new locators using the old SW version and import cluster configuration. They will form a new cluster. Existing client connections should still work, but new client connections are not yet possible (no servers connected to locators). 1. Second step – downgrade servers: * First shutdown all servers in parallel. This marks the beginning of total downtime. * Now start all servers in parallel but still on the new software version. Servers connect to the cluster formed by the downgraded locators. When servers are up, downtime ends. New client connections are possible. The rest of the rollback should be fully online. * Now per server: i. Shutdown it, revoke its disk-stores and delete its file system. ii. Start server using old SW version. When up, server will take over cluster configuration and pick up replicated data and partitioned regions buckets satisfying region redundancy (essentially will hold exactly the same data previous server had). The above has some important prerequisites: 1. Partitioned regions have redundancy and region configuration allows recovery as described above. 2. Clients version allows connection to new and old clusters - i.e. clients must not use newer version at the moment the procedure starts. 3. Geode guarantees cluster configuration exported from newer system can be imported into older system. In case of incompatibility I expect we could even manually edit the configuration to adapt it to the older system but it is a question how new servers will react when they connect (in step 2b). 4. Geode guarantees communication between peers with different SW version works and recovery of region data works. Could we have opinions on this offline procedure? It seems to work well but probably has caveats we do not see at the moment. What about prerequisites 3 and 4? It is valid in upgrade case but not sure if it holds in this rollback case. Best regards, -Alberto G. ________________________________ From: Anilkumar Gingade <aging...@pivotal.io> Sent: Thursday, April 23, 2020 12:59 AM To: geode <dev@geode.apache.org> Subject: Re: About Geode rolling downgrade That's right, most/always no down-time requirement is managed by having replicated cluster setups (Disaster-recovery/backup site). The data is either pushed to both systems through the data ingesters or by using WAN setup. The clusters are upgraded one at a time. If there is a failure during upgrade or needs to be rolled back; one system will be always up and running. -Anil. On Wed, Apr 22, 2020 at 1:51 PM Anthony Baker <aba...@pivotal.io> wrote: > Anil, let me see if I understand your perspective by stating it this way: > > If cases where 100% uptime is a requirement, users are almost always > running a disaster recovery site. It could be active/active or > active/standby but there are already at least 2 clusters with current > copies of the data. If an upgrade goes badly, the clusters can be > downgraded one at a time without loss of availability. This is because we > ensure compatibility across the wan protocol. > > Is that correct? > > > Anthony > > > > > On Apr 22, 2020, at 10:43 AM, Anilkumar Gingade <aging...@pivotal.io> > wrote: > > > >>> Rolling downgrade is a pretty important requirement for our customers > >>> I'd love to hear what others think about whether this feature is worth > > the overhead of making sure downgrades can always work. > > > > I/We haven't seen users/customers requesting rolling downgrade as a > > critical requirement for them; most of the time they had both an old and > > new setup to upgrade or switch back to an older setup. > > Considering the amount of work involved, and code complexity it brings > in; > > while there are ways to downgrade, it is hard to justify supporting this > > feature. > > > > -Anil. > >