Hello All, (I'm a newbie, so if my terminology is incorrect or my concepts are wrong, please point me in the right direction)(This is the first of several questions to come)
I've inherited a SOLR 4 cloud installation and we're having some issues with disk space on one of our shards. We currently have 64 servers serving a collection. The collection is managed by a zookeeper instance. There are two servers for each shard (32 replicated shards). We have a service that is constantly running and inserting new records into our collection as we get new data to be indexed. One of our shards is growing (on disk) disproportionately quickly. When the disk gets full, we start getting 500-series errors from the SOLR system and our websites start to fail. Currently, when we start seeing these errors, and IT sees that the disk is full on this particular server, the folks in IT delete the /data directory and restart the server (linux based). This has the effect of causing the shard to reboot and re-load itself from its paired partner. But I would expect that there is a more elegant way to recover from this event. Can anyone point me to a strategy that may be used in an instance such as this? Should we be taking steps to save the indexed information prior to restarting the server (more on this in a separate question). Should we be backing up something (anything) prior to the restart? (I'm still going through the SOLR wiki; so if the answer is there a link is appreciated). Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Strategy-for-removing-an-active-shard-from-zookeeper-tp4144892.html Sent from the Solr - User mailing list archive at Nabble.com.