RE: Solr crawls during replication

Jonathan Rochkind Fri, 03 Sep 2010 11:39:01 -0700

Is the OS disk cache something you configure, or something the OS just does 
automatically based on available free RAM?  Or does it depend on the exact OS?  
Thinking about the OS disk cache is new to me. Thanks for any tips. 
________________________________________
From: Shawn Heisey [s...@elyograg.org]
Sent: Friday, September 03, 2010 1:46 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr crawls during replication

  On 9/2/2010 9:31 AM, Mark wrote:
> Thanks for the suggestions. Our slaves have 12G with 10G dedicated to
> the JVM.. too much?
>
> Are the rysnc snappuller featurs still available in 1.4.1? I may try
> that to see if helps. Configuration of the switches may also be possible.
>
> Also, would you mind explaining your second point... using dual NIC
> cards. How can this be accomplished/configured. Thanks for you help

I will first admit that I am a relative newbie at this whole thing, so
find yourself a grain of salt before you read further ...

While it's probably not a bad idea to change to an rsync method and
implement bandwidth throttling, I'm betting the real root of your issue
is that you're low on memory, making your disk cache too small.  When
you do a replication, the simple act of copying the data shoves the
current index completely out of RAM, so when you do a query, it has to
go back to the disk (which is now VERY busy) to satisfy it.

Unless you know for sure that you need 10GB dedicated to the JVM, go
with much smaller values, because out of the 12GB available, that will
only leave you about 1.5GB, assuming the machine has no GUI and no other
processes.  If you need the JVM that large because you have very large
Solr caches, consider reducing their size dramatically.  In deciding
whether to use precious memory for the OS disk cache or Solr caches, the
OS should go first.  Additionally, If you have large Solr caches with a
small disk cache and configure large autowarm counts, you end up with
extremely long commit times.

I don't know how the 30GB of data in your index is distributed among the
various Lucene files, but for an index that size, I'd want to have
between 8GB and 16GB of RAM available to the OS just for disk caching,
and if more is possible, even better.  If you could get more than 32GB
of RAM in the server, your entire index would fit, and it would be very
fast.

With a little research, I came up (on my own) with what I think is a
decent rule of thumb, and I'm curious what the experts think of this
idea:  Find out how much space is taken by the index files with the
following extensions: fnm, fdx, frq, nrm, tii, tis, and tvx.  Think of
that as a bare minimum disk cache size, then shoot for between 1.5 and 3
times that value for your disk cache, so it can also cache parts of the
other files.

Thanks,
Shawn

RE: Solr crawls during replication

Reply via email to