RE: How to update SOLR schema from continuous integration environment

Will Martin Sat, 01 Nov 2014 07:32:17 -0700

http://www.thoughtworks.com/insights/blog/enabling-continuous-delivery-enterprises-testing

-----Original Message-----
From: Jack Krupansky [mailto:j...@basetechnology.com] 
Sent: Saturday, November 01, 2014 9:46 AM
To: solr-user@lucene.apache.org
Subject: Re: How to update SOLR schema from continuous integration environment

In all honesty, incrementally updating resources of a production server is a 
rather frightening proposition. Parallel testing is always a better way to go - 
bring up any changes in a parallel system for testing and then do an atomic 
"swap" - redirection of requests from the old server to the new server and then 
retire the old server only after the new server has had enough time to burn in 
and get past any infant mortality problems.

That's production. Testing and dev? Who needs the hassle; just tear the old 
server down and bring up the new server from scratch with all resources updated 
from the get-go.

Oh, and the starting point would be keeping your full set of config and 
resource files under source control so that you can carefully review changes 
before they are "pushed", can compare different revisions, and can easily back 
out a revision with confidence rather than "winging it."

That said, a lot of production systems these days are not designed for parallel 
operation and swapping out parallel systems, especially for cloud and cluster 
systems. In these cases the reality is more of a "rolling update", where one 
node at a time is taken down, updated, brought up, tested, brought back into 
production, tested some more, and only after enough burn in time do you move to 
the next node.

This rolling update may also force you to sequence or stage your changes so 
that old and new nodes are at least relatively compatible. So, the first stage 
would update all nodes, one at a time, to the intermediate compatible change, 
and only when that rolling update of all nodes is complete would you move up to 
the next stage of the update to replace the intermediate update with the final 
update. And maybe more than one intermediate stage is required for more complex 
updates.

Some changes might involve upgrading Java jars as well, in a way that might 
cause nodes give incompatible results, in which case you may need to stage or 
sequence your Java changes as well, so that you don't make the final code 
change until you have verified that all nodes have compatible intermediate code 
that is compatible with both old nodes and new nodes.

Of course, it all depends on the nature of the update. For example, adding more 
synonyms may or may not be harmless with respect to whether existing index data 
becomes invalidated and each node needs to be completely reindexed, or if 
query-time synonyms are incompatible with index-time synonyms. Ditto for just 
about any analysis chain changes - they may be harmless, they may require full 
reindexing, they may simply not work for new data (i.e., a synonym is added in 
response to late-breaking news or an addition to a taxonomy) until nodes are 
updated, or maybe some queries become slightly or somewhat inaccurate until the 
update/reindex is complete.

So, you might want to have two stages of test system - one to just do a raw 
functional test of the changes, like whether your new synonyms work as expected 
or not, and then the pre-production stage which would be updated using exactly 
the same process as the production system, such as a rolling update or staged 
rolling update as required. The closer that pre-production system is run to the 
actual production, the greater the odds that you can have confidence that the 
update won't compromise the production system.

The pre-production test system might have, say, 10% of the production data and 
by only 10% the size of the production system.

In short, for smaller clusters having parallel systems with an atomic 
swap/redirection is probably simplest, while for larger clusters an incremental 
rolling update with thorough testing on a pre-production test cluster is the 
way to go.

-- Jack Krupansky

-----Original Message-----
From: Faisal Mansoor
Sent: Saturday, November 1, 2014 12:10 AM
To: solr-user@lucene.apache.org
Subject: How to update SOLR schema from continuous integration environment

Hi,

How do people usually update Solr configuration files from continuous 
integration environment like TeamCity or Jenkins.

We have multiple development and testing environments and use WebDeploy and 
AwsDeploy type of tools to remotely deploy code multiple times a day, to update 
solr I wrote a simple node server which accepts conf folder over http, updates 
the specified conf core folder and restarts the solr service.

Does there exists a standard tool for this uses case. I know about schema rest 
api, but, I want to update all the files in the conf folder rather than just 
updating a single file or adding or removing synonyms piecemeal.

Here is the link for the node server I mentioned if anyone is interested.
https://github.com/faisalmansoor/UpdateSolrConfig

Thanks,
Faisal

RE: How to update SOLR schema from continuous integration environment

Reply via email to