RE: How to update SOLR schema from continuous integration environment

Will Martin Sat, 01 Nov 2014 20:33:07 -0700

Well yes. But since there hasn't been any devops approaches yet, we really
aren't talking about Continuous Delivery. Continually delivering builds into
production is old hat and Jack nailed the canonical manners in which it has
been done. It really depends on whether an org is investing in the full
Agile lifecycle. A piece at a time is common,.


One possible devop approach:

Once you get near full test automation
: Jenkins builds the target
: chef does due diligence on dependencies
: chef pulls the build over. 
: chef configures the build once it is installed.
:chef takes the machine out of the load-balancers rotation
: chef puts the machine back in once it is launched and sanity tested (by
chef).

<or puppet or any others I'm not familiar with>


If you substitute Jack's plan, you get pretty much the same thing; except
that by using devops tools you introduce a little thing called idempotency.



-----Original Message-----
From: Walter Underwood [mailto:wun...@wunderwood.org] 
Sent: Saturday, November 01, 2014 12:25 PM
To: solr-user@lucene.apache.org
Subject: Re: How to update SOLR schema from continuous integration
environment

Nice pictures, but that preso does not even begin to answer the question.

With master/slave replication, I do schema migration in two ways, depending
on whether a field is added or removed.

Adding a field:

1. Update the schema on the slaves. A defined field with no data is not a
problem.
2. Update the master.
3. Reindex to populate the field and wait for replication.
4. Update the request handlers or clients to use the new field.

Removing a field is the opposite. I haven't tried lately, but Solr used to
have problems with a field that was in the index but not in the schema.

1. Update the request handlers and clients to stop using the field.
2. Reindex without any data for the field that will be removed, wait for
replication.
3. Update the schema on the master and slaves.

I have not tried to automate this for continuous deployment. It isn't a big
deal for a single server test environment. It is the prod deployment that is
tricky.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/


On Nov 1, 2014, at 7:29 AM, Will Martin <wmartin...@gmail.com> wrote:

>
http://www.thoughtworks.com/insights/blog/enabling-continuous-delivery-enter
prises-testing
> 
> 
> -----Original Message-----
> From: Jack Krupansky [mailto:j...@basetechnology.com] 
> Sent: Saturday, November 01, 2014 9:46 AM
> To: solr-user@lucene.apache.org
> Subject: Re: How to update SOLR schema from continuous integration
environment
> 
> In all honesty, incrementally updating resources of a production server is
a rather frightening proposition. Parallel testing is always a better way to
go - bring up any changes in a parallel system for testing and then do an
atomic "swap" - redirection of requests from the old server to the new
server and then retire the old server only after the new server has had
enough time to burn in and get past any infant mortality problems.
> 
> That's production. Testing and dev? Who needs the hassle; just tear the
old server down and bring up the new server from scratch with all resources
updated from the get-go.
> 
> Oh, and the starting point would be keeping your full set of config and
resource files under source control so that you can carefully review changes
before they are "pushed", can compare different revisions, and can easily
back out a revision with confidence rather than "winging it."
> 
> That said, a lot of production systems these days are not designed for
parallel operation and swapping out parallel systems, especially for cloud
and cluster systems. In these cases the reality is more of a "rolling
update", where one node at a time is taken down, updated, brought up,
tested, brought back into production, tested some more, and only after
enough burn in time do you move to the next node.
> 
> This rolling update may also force you to sequence or stage your changes
so that old and new nodes are at least relatively compatible. So, the first
stage would update all nodes, one at a time, to the intermediate compatible
change, and only when that rolling update of all nodes is complete would you
move up to the next stage of the update to replace the intermediate update
with the final update. And maybe more than one intermediate stage is
required for more complex updates.
> 
> Some changes might involve upgrading Java jars as well, in a way that
might cause nodes give incompatible results, in which case you may need to
stage or sequence your Java changes as well, so that you don't make the
final code change until you have verified that all nodes have compatible
intermediate code that is compatible with both old nodes and new nodes.
> 
> Of course, it all depends on the nature of the update. For example, adding
more synonyms may or may not be harmless with respect to whether existing
index data becomes invalidated and each node needs to be completely
reindexed, or if query-time synonyms are incompatible with index-time
synonyms. Ditto for just about any analysis chain changes - they may be
harmless, they may require full reindexing, they may simply not work for new
data (i.e., a synonym is added in response to late-breaking news or an
addition to a taxonomy) until nodes are updated, or maybe some queries
become slightly or somewhat inaccurate until the update/reindex is complete.
> 
> So, you might want to have two stages of test system - one to just do a
raw functional test of the changes, like whether your new synonyms work as
expected or not, and then the pre-production stage which would be updated
using exactly the same process as the production system, such as a rolling
update or staged rolling update as required. The closer that pre-production
system is run to the actual production, the greater the odds that you can
have confidence that the update won't compromise the production system.
> 
> The pre-production test system might have, say, 10% of the production data
and by only 10% the size of the production system.
> 
> In short, for smaller clusters having parallel systems with an atomic
swap/redirection is probably simplest, while for larger clusters an
incremental rolling update with thorough testing on a pre-production test
cluster is the way to go.
> 
> -- Jack Krupansky
> 
> -----Original Message-----
> From: Faisal Mansoor
> Sent: Saturday, November 1, 2014 12:10 AM
> To: solr-user@lucene.apache.org
> Subject: How to update SOLR schema from continuous integration environment
> 
> Hi,
> 
> How do people usually update Solr configuration files from continuous
integration environment like TeamCity or Jenkins.
> 
> We have multiple development and testing environments and use WebDeploy
and AwsDeploy type of tools to remotely deploy code multiple times a day, to
update solr I wrote a simple node server which accepts conf folder over
http, updates the specified conf core folder and restarts the solr service.
> 
> Does there exists a standard tool for this uses case. I know about schema
rest api, but, I want to update all the files in the conf folder rather than
just updating a single file or adding or removing synonyms piecemeal.
> 
> Here is the link for the node server I mentioned if anyone is interested.
> https://github.com/faisalmansoor/UpdateSolrConfig
> 
> 
> Thanks,
> Faisal 
> 
>

RE: How to update SOLR schema from continuous integration environment

Reply via email to