On 5/4/2013 2:06 AM, Kumar Limbu wrote: > We have Solr setup on 3 machines with only a single shard. We are using Solr > 4.0 and currently have around 7 Million documents in our index. The size of > our index is around 25 GB. We have a zookeeper ensemble of 3 zookeeper > instances. > > Let's call the servers in our setup server (A), (B) and (C). All updates to > Solr goes via server (C). Searches are performed on server (A) and (B). The > updates are normally propagated incrementally from server (C) to the other 2 > servers. Intermittently we have noted that the servers (A) and (B) makes a > full copy of the index from server (C). This is not ideal because when this > happens performance suffers. This occurs quite randomly and can occur on any > of the other 2 nodes i.e. (A) and (B). > > On the server (C), which is the leader, we see errors like the following .We > suspect this might be the reason why a full index copy occurs in the other > nodes but we haven't been able to find out why this error is occurring. > There is no connectivity issue with the servers.
Advance warning: this is a long reply. The first thing that jumped out at me was the Solr version. Version 4.0 was brand new in October of last year. It's a senior citizen now. It has a lot of bugs, particularly in SolrCloud stability. I would recommend upgrading to at least 4.2.1. Version 4.3.0 (the fourth since 4.0) is quite literally about to be unveiled. It is already on a lot of download mirrors, the announcement is due any time now. Now for things to consider that don't involve upgrading, but might still be issues after upgrading: You might be able to make your system more stable by increasing your zkClientTimeout. A typical example value for this setting is 15 seconds. Next we will discuss why you might be exceeding the timeout: Slow operations, especially on commits, can be responsible for exceeding timeouts. One of the things you can do to decrease commit time is to lower the autowarmCount on your Solr caches. You can also decrease the frequency of your commits. A 25GB index is relatively large, and requires a lot of memory for proper operation. The reason it requires a lot of memory is because Solr is very reliant on the operating system disk cache, which uses free memory. http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html With a 25GB index, you want to have between 15 and 25GB of memory over and above the memory that your programs use. You would probably want to give the Java heap for Solr between 4 and 8GB. For a dedicated Solr server with your index, a really good amount of total system memory would be 32GB, with 24GB being a reasonable starting point. It should go without saying that you need a 64 bit server, a 64 bit operating system, and 64 bit Java for all this to work correctly. 32 bit software is not good at dealing with large amounts of memory, and 32 bit Java cannot have a heap size larger than 2GB. If you upgrade to 4.2.1 or later and reindex, your index size will drop due to compression of certain pieces. Those pieces don't normally affect minimum memory requirements very much, so your free memory requirement will still probably be at least 15GB. Unless you are using a commercial JVM with low-pause characteristics (like Zing), a heap of 4GB or larger can give you problems with stop-the-world GC pauses. A large heap is unfortunately required with a large index. The default collector that Java gives you is a *terrible* choice for large heaps in general and Solr in particular. Even changing to the CMS collector may not be enough - more tuning is required. Thanks, Shawn