RE: How to update SOLR schema from continuous integration environment

Will Martin Sun, 02 Nov 2014 03:39:08 -0800

Well. You don't really think I HAVE a solr installation, do you Walter?  ;-)


No you're right.  The pattern I put out was general. 

It depends on the schema change doesn't it?


-----Original Message-----
From: Walter Underwood [mailto:wun...@wunderwood.org] 
Sent: Saturday, November 01, 2014 11:42 PM
To: solr-user@lucene.apache.org
Subject: Re: How to update SOLR schema from continuous integration
environment

You do that with schema changes and I'll watch your site crash.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/


On Nov 1, 2014, at 8:31 PM, Will Martin <wmartin...@gmail.com> wrote:

> Well yes. But since there hasn't been any devops approaches yet, we 
> really aren't talking about Continuous Delivery. Continually 
> delivering builds into production is old hat and Jack nailed the 
> canonical manners in which it has been done. It really depends on 
> whether an org is investing in the full Agile lifecycle. A piece at a time
is common,.
> 
> One possible devop approach:
> 
> Once you get near full test automation
> : Jenkins builds the target
> : chef does due diligence on dependencies
> : chef pulls the build over. 
> : chef configures the build once it is installed.
> :chef takes the machine out of the load-balancers rotation
> : chef puts the machine back in once it is launched and sanity tested 
> (by chef).
> 
> <or puppet or any others I'm not familiar with>
> 
> 
> If you substitute Jack's plan, you get pretty much the same thing; 
> except that by using devops tools you introduce a little thing called
idempotency.
> 
> 
> 
> -----Original Message-----
> From: Walter Underwood [mailto:wun...@wunderwood.org]
> Sent: Saturday, November 01, 2014 12:25 PM
> To: solr-user@lucene.apache.org
> Subject: Re: How to update SOLR schema from continuous integration 
> environment
> 
> Nice pictures, but that preso does not even begin to answer the question.
> 
> With master/slave replication, I do schema migration in two ways, 
> depending on whether a field is added or removed.
> 
> Adding a field:
> 
> 1. Update the schema on the slaves. A defined field with no data is 
> not a problem.
> 2. Update the master.
> 3. Reindex to populate the field and wait for replication.
> 4. Update the request handlers or clients to use the new field.
> 
> Removing a field is the opposite. I haven't tried lately, but Solr 
> used to have problems with a field that was in the index but not in the
schema.
> 
> 1. Update the request handlers and clients to stop using the field.
> 2. Reindex without any data for the field that will be removed, wait 
> for replication.
> 3. Update the schema on the master and slaves.
> 
> I have not tried to automate this for continuous deployment. It isn't 
> a big deal for a single server test environment. It is the prod 
> deployment that is tricky.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/
> 
> 
> On Nov 1, 2014, at 7:29 AM, Will Martin <wmartin...@gmail.com> wrote:
> 
>> 
> http://www.thoughtworks.com/insights/blog/enabling-continuous-delivery
> -enter
> prises-testing
>> 
>> 
>> -----Original Message-----
>> From: Jack Krupansky [mailto:j...@basetechnology.com]
>> Sent: Saturday, November 01, 2014 9:46 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: How to update SOLR schema from continuous integration
> environment
>> 
>> In all honesty, incrementally updating resources of a production 
>> server is
> a rather frightening proposition. Parallel testing is always a better 
> way to go - bring up any changes in a parallel system for testing and 
> then do an atomic "swap" - redirection of requests from the old server 
> to the new server and then retire the old server only after the new 
> server has had enough time to burn in and get past any infant mortality
problems.
>> 
>> That's production. Testing and dev? Who needs the hassle; just tear 
>> the
> old server down and bring up the new server from scratch with all 
> resources updated from the get-go.
>> 
>> Oh, and the starting point would be keeping your full set of config 
>> and
> resource files under source control so that you can carefully review 
> changes before they are "pushed", can compare different revisions, and 
> can easily back out a revision with confidence rather than "winging it."
>> 
>> That said, a lot of production systems these days are not designed 
>> for
> parallel operation and swapping out parallel systems, especially for 
> cloud and cluster systems. In these cases the reality is more of a 
> "rolling update", where one node at a time is taken down, updated, 
> brought up, tested, brought back into production, tested some more, 
> and only after enough burn in time do you move to the next node.
>> 
>> This rolling update may also force you to sequence or stage your 
>> changes
> so that old and new nodes are at least relatively compatible. So, the 
> first stage would update all nodes, one at a time, to the intermediate 
> compatible change, and only when that rolling update of all nodes is 
> complete would you move up to the next stage of the update to replace 
> the intermediate update with the final update. And maybe more than one 
> intermediate stage is required for more complex updates.
>> 
>> Some changes might involve upgrading Java jars as well, in a way that
> might cause nodes give incompatible results, in which case you may 
> need to stage or sequence your Java changes as well, so that you don't 
> make the final code change until you have verified that all nodes have 
> compatible intermediate code that is compatible with both old nodes and
new nodes.
>> 
>> Of course, it all depends on the nature of the update. For example, 
>> adding
> more synonyms may or may not be harmless with respect to whether 
> existing index data becomes invalidated and each node needs to be 
> completely reindexed, or if query-time synonyms are incompatible with 
> index-time synonyms. Ditto for just about any analysis chain changes - 
> they may be harmless, they may require full reindexing, they may 
> simply not work for new data (i.e., a synonym is added in response to 
> late-breaking news or an addition to a taxonomy) until nodes are 
> updated, or maybe some queries become slightly or somewhat inaccurate
until the update/reindex is complete.
>> 
>> So, you might want to have two stages of test system - one to just do 
>> a
> raw functional test of the changes, like whether your new synonyms 
> work as expected or not, and then the pre-production stage which would 
> be updated using exactly the same process as the production system, 
> such as a rolling update or staged rolling update as required. The 
> closer that pre-production system is run to the actual production, the 
> greater the odds that you can have confidence that the update won't
compromise the production system.
>> 
>> The pre-production test system might have, say, 10% of the production 
>> data
> and by only 10% the size of the production system.
>> 
>> In short, for smaller clusters having parallel systems with an atomic
> swap/redirection is probably simplest, while for larger clusters an 
> incremental rolling update with thorough testing on a pre-production 
> test cluster is the way to go.
>> 
>> -- Jack Krupansky
>> 
>> -----Original Message-----
>> From: Faisal Mansoor
>> Sent: Saturday, November 1, 2014 12:10 AM
>> To: solr-user@lucene.apache.org
>> Subject: How to update SOLR schema from continuous integration 
>> environment
>> 
>> Hi,
>> 
>> How do people usually update Solr configuration files from continuous
> integration environment like TeamCity or Jenkins.
>> 
>> We have multiple development and testing environments and use 
>> WebDeploy
> and AwsDeploy type of tools to remotely deploy code multiple times a 
> day, to update solr I wrote a simple node server which accepts conf 
> folder over http, updates the specified conf core folder and restarts the
solr service.
>> 
>> Does there exists a standard tool for this uses case. I know about 
>> schema
> rest api, but, I want to update all the files in the conf folder 
> rather than just updating a single file or adding or removing synonyms
piecemeal.
>> 
>> Here is the link for the node server I mentioned if anyone is interested.
>> https://github.com/faisalmansoor/UpdateSolrConfig
>> 
>> 
>> Thanks,
>> Faisal
>> 
>> 
> 
>

RE: How to update SOLR schema from continuous integration environment

Reply via email to