Re: How to update SOLR schema from continuous integration environment

Jack Krupansky Sat, 01 Nov 2014 06:48:06 -0700

In all honesty, incrementally updating resources of a production server is arather frightening proposition. Parallel testing is always a better way togo - bring up any changes in a parallel system for testing and then do anatomic "swap" - redirection of requests from the old server to the newserver and then retire the old server only after the new server has hadenough time to burn in and get past any infant mortality problems.

That's production. Testing and dev? Who needs the hassle; just tear the oldserver down and bring up the new server from scratch with all resourcesupdated from the get-go.

Oh, and the starting point would be keeping your full set of config andresource files under source control so that you can carefully review changesbefore they are "pushed", can compare different revisions, and can easilyback out a revision with confidence rather than "winging it."

That said, a lot of production systems these days are not designed forparallel operation and swapping out parallel systems, especially for cloudand cluster systems. In these cases the reality is more of a "rollingupdate", where one node at a time is taken down, updated, brought up,tested, brought back into production, tested some more, and only afterenough burn in time do you move to the next node.

This rolling update may also force you to sequence or stage your changes sothat old and new nodes are at least relatively compatible. So, the firststage would update all nodes, one at a time, to the intermediate compatiblechange, and only when that rolling update of all nodes is complete would youmove up to the next stage of the update to replace the intermediate updatewith the final update. And maybe more than one intermediate stage isrequired for more complex updates.

Some changes might involve upgrading Java jars as well, in a way that mightcause nodes give incompatible results, in which case you may need to stageor sequence your Java changes as well, so that you don't make the final codechange until you have verified that all nodes have compatible intermediatecode that is compatible with both old nodes and new nodes.

Of course, it all depends on the nature of the update. For example, addingmore synonyms may or may not be harmless with respect to whether existingindex data becomes invalidated and each node needs to be completelyreindexed, or if query-time synonyms are incompatible with index-timesynonyms. Ditto for just about any analysis chain changes - they may beharmless, they may require full reindexing, they may simply not work for newdata (i.e., a synonym is added in response to late-breaking news or anaddition to a taxonomy) until nodes are updated, or maybe some queriesbecome slightly or somewhat inaccurate until the update/reindex is complete.

So, you might want to have two stages of test system - one to just do a rawfunctional test of the changes, like whether your new synonyms work asexpected or not, and then the pre-production stage which would be updatedusing exactly the same process as the production system, such as a rollingupdate or staged rolling update as required. The closer that pre-productionsystem is run to the actual production, the greater the odds that you canhave confidence that the update won't compromise the production system.

The pre-production test system might have, say, 10% of the production dataand by only 10% the size of the production system.

In short, for smaller clusters having parallel systems with an atomicswap/redirection is probably simplest, while for larger clusters anincremental rolling update with thorough testing on a pre-production testcluster is the way to go.


-- Jack Krupansky

-----Original Message-----From: Faisal Mansoor

Sent: Saturday, November 1, 2014 12:10 AM
To: solr-user@lucene.apache.org
Subject: How to update SOLR schema from continuous integration environment

Hi,

How do people usually update Solr configuration files from continuous
integration environment like TeamCity or Jenkins.

We have multiple development and testing environments and use WebDeploy and
AwsDeploy type of tools to remotely deploy code multiple times a day, to
update solr I wrote a simple node server which accepts conf folder over
http, updates the specified conf core folder and restarts the solr service.

Does there exists a standard tool for this uses case. I know about schema
rest api, but, I want to update all the files in the conf folder rather
than just updating a single file or adding or removing synonyms piecemeal.

Here is the link for the node server I mentioned if anyone is interested.
https://github.com/faisalmansoor/UpdateSolrConfig

Thanks,

Faisal

Re: How to update SOLR schema from continuous integration environment

Reply via email to