Re: How to update SOLR schema from continuous integration environment

Jack Krupansky Sun, 02 Nov 2014 04:09:16 -0800

Besides, if you follow my methodology with a pre-production test system,sure, that test system may crash for some schema changes if you don't followall the details of my methodology, but Walter won't be able to actually"see" that internal-only site "crash".

Further, the "crash" would more likely have occurred on your "dev" clusterfirst, well before even making it to your pre-production test system.


-- Jack Krupansky

-----Original Message-----From: Will Martin

Sent: Sunday, November 2, 2014 6:37 AM
To: [email protected]

Subject: RE: How to update SOLR schema from continuous integrationenvironment


Well. You don't really think I HAVE a solr installation, do you Walter?  ;-)

No you're right.  The pattern I put out was general.

It depends on the schema change doesn't it?


-----Original Message-----
From: Walter Underwood [mailto:[email protected]]
Sent: Saturday, November 01, 2014 11:42 PM
To: [email protected]
Subject: Re: How to update SOLR schema from continuous integration
environment

You do that with schema changes and I'll watch your site crash.

wunder
Walter Underwood
[email protected]
http://observer.wunderwood.org/


On Nov 1, 2014, at 8:31 PM, Will Martin <[email protected]> wrote:

Well yes. But since there hasn't been any devops approaches yet, we
really aren't talking about Continuous Delivery. Continually
delivering builds into production is old hat and Jack nailed the
canonical manners in which it has been done. It really depends on
whether an org is investing in the full Agile lifecycle. A piece at a time

is common,.


One possible devop approach:

Once you get near full test automation
: Jenkins builds the target
: chef does due diligence on dependencies
: chef pulls the build over.
: chef configures the build once it is installed.
:chef takes the machine out of the load-balancers rotation
: chef puts the machine back in once it is launched and sanity tested
(by chef).

<or puppet or any others I'm not familiar with>


If you substitute Jack's plan, you get pretty much the same thing;
except that by using devops tools you introduce a little thing called

idempotency.

-----Original Message-----
From: Walter Underwood [mailto:[email protected]]
Sent: Saturday, November 01, 2014 12:25 PM
To: [email protected]
Subject: Re: How to update SOLR schema from continuous integration
environment

Nice pictures, but that preso does not even begin to answer the question.

With master/slave replication, I do schema migration in two ways,
depending on whether a field is added or removed.

Adding a field:

1. Update the schema on the slaves. A defined field with no data is
not a problem.
2. Update the master.
3. Reindex to populate the field and wait for replication.
4. Update the request handlers or clients to use the new field.

Removing a field is the opposite. I haven't tried lately, but Solr
used to have problems with a field that was in the index but not in the

schema.


1. Update the request handlers and clients to stop using the field.
2. Reindex without any data for the field that will be removed, wait
for replication.
3. Update the schema on the master and slaves.

I have not tried to automate this for continuous deployment. It isn't
a big deal for a single server test environment. It is the prod
deployment that is tricky.

wunder
Walter Underwood
[email protected]
http://observer.wunderwood.org/


On Nov 1, 2014, at 7:29 AM, Will Martin <[email protected]> wrote:

http://www.thoughtworks.com/insights/blog/enabling-continuous-delivery
-enter
prises-testing



-----Original Message-----
From: Jack Krupansky [mailto:[email protected]]
Sent: Saturday, November 01, 2014 9:46 AM
To: [email protected]
Subject: Re: How to update SOLR schema from continuous integration

environment


In all honesty, incrementally updating resources of a production
server is

a rather frightening proposition. Parallel testing is always a better
way to go - bring up any changes in a parallel system for testing and
then do an atomic "swap" - redirection of requests from the old server
to the new server and then retire the old server only after the new
server has had enough time to burn in and get past any infant mortality

problems.


That's production. Testing and dev? Who needs the hassle; just tear
the

old server down and bring up the new server from scratch with all
resources updated from the get-go.


Oh, and the starting point would be keeping your full set of config
and

resource files under source control so that you can carefully review
changes before they are "pushed", can compare different revisions, and
can easily back out a revision with confidence rather than "winging it."


That said, a lot of production systems these days are not designed
for

parallel operation and swapping out parallel systems, especially for
cloud and cluster systems. In these cases the reality is more of a
"rolling update", where one node at a time is taken down, updated,
brought up, tested, brought back into production, tested some more,
and only after enough burn in time do you move to the next node.


This rolling update may also force you to sequence or stage your
changes

so that old and new nodes are at least relatively compatible. So, the
first stage would update all nodes, one at a time, to the intermediate
compatible change, and only when that rolling update of all nodes is
complete would you move up to the next stage of the update to replace
the intermediate update with the final update. And maybe more than one
intermediate stage is required for more complex updates.


Some changes might involve upgrading Java jars as well, in a way that

might cause nodes give incompatible results, in which case you may
need to stage or sequence your Java changes as well, so that you don't
make the final code change until you have verified that all nodes have
compatible intermediate code that is compatible with both old nodes and

new nodes.


Of course, it all depends on the nature of the update. For example,
adding

more synonyms may or may not be harmless with respect to whether
existing index data becomes invalidated and each node needs to be
completely reindexed, or if query-time synonyms are incompatible with
index-time synonyms. Ditto for just about any analysis chain changes -
they may be harmless, they may require full reindexing, they may
simply not work for new data (i.e., a synonym is added in response to
late-breaking news or an addition to a taxonomy) until nodes are
updated, or maybe some queries become slightly or somewhat inaccurate

until the update/reindex is complete.


So, you might want to have two stages of test system - one to just do
a

raw functional test of the changes, like whether your new synonyms
work as expected or not, and then the pre-production stage which would
be updated using exactly the same process as the production system,
such as a rolling update or staged rolling update as required. The
closer that pre-production system is run to the actual production, the
greater the odds that you can have confidence that the update won't

compromise the production system.


The pre-production test system might have, say, 10% of the production
data

and by only 10% the size of the production system.


In short, for smaller clusters having parallel systems with an atomic

swap/redirection is probably simplest, while for larger clusters an
incremental rolling update with thorough testing on a pre-production
test cluster is the way to go.


-- Jack Krupansky

-----Original Message-----
From: Faisal Mansoor
Sent: Saturday, November 1, 2014 12:10 AM
To: [email protected]
Subject: How to update SOLR schema from continuous integration
environment

Hi,

How do people usually update Solr configuration files from continuous

integration environment like TeamCity or Jenkins.


We have multiple development and testing environments and use
WebDeploy

and AwsDeploy type of tools to remotely deploy code multiple times a
day, to update solr I wrote a simple node server which accepts conf
folder over http, updates the specified conf core folder and restarts the

solr service.


Does there exists a standard tool for this uses case. I know about
schema

rest api, but, I want to update all the files in the conf folder
rather than just updating a single file or adding or removing synonyms

piecemeal.


Here is the link for the node server I mentioned if anyone is interested.
https://github.com/faisalmansoor/UpdateSolrConfig


Thanks,
Faisal

Re: How to update SOLR schema from continuous integration environment

Reply via email to