On 9/21/2016 7:52 AM, Kyle Daving wrote:
> We are currently running solr 5.2.1 and attempted to upgrade to 6.2.1.
> We attempted this last week but ran into disk access latency problems
> so reverted back to 5.2.1. We found that after upgrading we overran
> the NVRAM on our SAN and caused a fairly large queue depth for disk
> access (we did not have this problem in 5.2.1). We reached out to our
> SAN vendor and they said that is was due to the size of our optimized
> indexes. It is not uncommon for us to have roughly 300GB single file
> optimized indexes. Our SAN vendor advised that splitting the index
> into smaller fragmented chunks would alleviate the NVRAM/queue depth
> problem. 

How is this filesystem presented to the server?  Is it a block device
using a protocol like iSCSI, or is it a network filesystem, like NFS or
SMB?  Block filesystems will appear to the OS as if they are a
completely local filesystem, and local machine memory will be used to
cache data.  Network filesystems will usually require memory on the
storage device for caching, and typically those machines do not have a
lot of memory compared to the amount of storage space they have.

> Why do we not see this problem with the same size index in 5.2.1? Did
> solr change the way it accesses disk in v5 vs v6? 

It's hard to say why you didn't have the problem with the earlier version.

All the index disk access is handled by Lucene, and from Solr's point of
view, it's a black box, with only minimal configuration available. 
Lucene is constantly being improved, but those improvements assume the
general best-case installation -- a machine with a local filesystem and
plenty of spare memory to effectively cache the data that filesystem
contains.

> Is there a configuration file we should be looking at making
> adjustments in? 

Unless we can figure out why there's a problem, this question cannot be
answered.

> Since everything worked fine in 5.2.1 there has to be something we are
> overlooking when trying to use 6.2.1. Any comments and thoughts are
> appreciated.

Best guess (which could be wrong):  There's not enough memory to
effectively cache the data in the Lucene indexes.  A newer version of
Solr generally has *better* performance characteristics than an earlier
version, but *ONLY* if there's enough memory available to effectively
cache the index data, which assures that data can be accessed very
quickly.  When the actual disk must be read, access speed will be slow
... and the problem may get worse with a different version.

How much memory is in your Solr server, and how much is assigned to the
Java heap for Solr?  Are you running more than one Solr instance per server?

When you're dealing with a remote filesystem on a SAN, exactly where to
add memory to boost performance will depend on how the filesystem is
being presented.

I strongly recommend against using a network filesystem like NFS or SMB
to hold a Solr index.  Solr works best when the filesystem is local to
the server and there's plenty of extra memory for caching.  The amount
of memory required for good performance with a 300GB index will be
substantial.

Thanks,
Shawn

Reply via email to