Strategy for removing an active shard from zookeeper

tomasv Mon, 30 Jun 2014 16:47:19 -0700

Hello All, 
(I'm a newbie, so if my terminology is incorrect or my concepts are wrong,
please point me in the right direction)(This is the first of several
questions to come)

I've inherited a SOLR 4 cloud installation and we're having some issues with
disk space on one of our shards.

We currently have 64 servers serving a collection. The collection is managed
by a zookeeper instance. There are two servers for each shard (32 replicated
shards).

We have a service that is constantly running and inserting new records into
our collection as we get new data to be indexed.

One of our shards is growing (on disk) disproportionately quickly. When
the disk gets full, we start getting 500-series errors from the SOLR system
and our websites start to fail.

Currently, when we start seeing these errors, and IT sees that the disk is
full on this particular server, the folks in IT delete the /data directory
and restart the server (linux based). This has the effect of causing the
shard to reboot and re-load itself from its paired partner.

But I would expect that there is a more elegant way to recover from this
event.

Can anyone point me to a strategy that may be used in an instance such as
this? Should we be taking steps to save the indexed information prior to
restarting the server (more on this in a separate question). Should we be
backing up something (anything) prior to the restart?

(I'm still going through the SOLR wiki; so if the answer is there a link is
appreciated).

Thanks!

--
View this message in context:
http://lucene.472066.n3.nabble.com/Strategy-for-removing-an-active-shard-from-zookeeper-tp4144892.html
Sent from the Solr - User mailing list archive at Nabble.com.

Strategy for removing an active shard from zookeeper

Reply via email to