Is the OS disk cache something you configure, or something the OS just does automatically based on available free RAM? Or does it depend on the exact OS? Thinking about the OS disk cache is new to me. Thanks for any tips. ________________________________________ From: Shawn Heisey [s...@elyograg.org] Sent: Friday, September 03, 2010 1:46 PM To: solr-user@lucene.apache.org Subject: Re: Solr crawls during replication
On 9/2/2010 9:31 AM, Mark wrote: > Thanks for the suggestions. Our slaves have 12G with 10G dedicated to > the JVM.. too much? > > Are the rysnc snappuller featurs still available in 1.4.1? I may try > that to see if helps. Configuration of the switches may also be possible. > > Also, would you mind explaining your second point... using dual NIC > cards. How can this be accomplished/configured. Thanks for you help I will first admit that I am a relative newbie at this whole thing, so find yourself a grain of salt before you read further ... While it's probably not a bad idea to change to an rsync method and implement bandwidth throttling, I'm betting the real root of your issue is that you're low on memory, making your disk cache too small. When you do a replication, the simple act of copying the data shoves the current index completely out of RAM, so when you do a query, it has to go back to the disk (which is now VERY busy) to satisfy it. Unless you know for sure that you need 10GB dedicated to the JVM, go with much smaller values, because out of the 12GB available, that will only leave you about 1.5GB, assuming the machine has no GUI and no other processes. If you need the JVM that large because you have very large Solr caches, consider reducing their size dramatically. In deciding whether to use precious memory for the OS disk cache or Solr caches, the OS should go first. Additionally, If you have large Solr caches with a small disk cache and configure large autowarm counts, you end up with extremely long commit times. I don't know how the 30GB of data in your index is distributed among the various Lucene files, but for an index that size, I'd want to have between 8GB and 16GB of RAM available to the OS just for disk caching, and if more is possible, even better. If you could get more than 32GB of RAM in the server, your entire index would fit, and it would be very fast. With a little research, I came up (on my own) with what I think is a decent rule of thumb, and I'm curious what the experts think of this idea: Find out how much space is taken by the index files with the following extensions: fnm, fdx, frq, nrm, tii, tis, and tvx. Think of that as a bare minimum disk cache size, then shoot for between 1.5 and 3 times that value for your disk cache, so it can also cache parts of the other files. Thanks, Shawn