The process we use to signal the read-only servers, is to submit a CREATE request pointing to the newly created index, with a name like corebak, then doing a SWAP request between core and corebak, then submit an UNLOAD request for the corebak which is now pointing at the previous version.

The individual servers cannot do a merge on their own, since they mount the NAS read-only. Nothing they can do will affect the index. I believe this allows each machine to cache much of the index in memory, with no fear that their cache will be made invalid by one of the others.

-Bob Haschart
University of Virginia Library



On 5/26/2017 12:52 PM, David Hastings wrote:
Im curious about this.  when you say "and signal the three Solr servers
when the updated index is available.  " how does it send the signal? IE
what command, just a reload?  Also what prevents them from doing a merge on
their own?  Thanks

On Fri, May 26, 2017 at 12:09 PM, Robert Haschart <rh...@virginia.edu>
wrote:

We have run using this exact scenario for several years.   We have three
Solr servers sitting behind a load balancer, with all three accessing the
same Solr index stored on read-only network addressable storage.   A fourth
machine is used to update the index (typically daily) and signal the three
Solr servers when the updated index is available.   Our index is primarily
bibliographic information and it contains about 8 million documents and is
about 30GB in size.    We've used this configuration since before Zookeeper
and Cloud-based Solr or even java-based master slave replication were
available.   I cannot say whether this configuration has any benefits over
the current accepted way of load-balancing, but it has worked well for us
for several years and we've never had a corrupted index problem.


-Bob Haschart
University of Virginia Library



On 5/23/2017 10:05 PM, Shawn Heisey wrote:

On 5/19/2017 8:33 AM, Ravi Kumar Taminidi wrote:

Hello,  Scenario: Currently we have 2 Solr Servers running in 2
different servers (linux), Is there any way can we make the Core to be
located in NAS or Network shared Drive so both the solrs using the same
Index.

Let me know if any performance issues, our size of Index is appx 1GB.

I think it's a very bad idea to try to share indexes between multiple
Solr instances.  You can override the locking and get it to work, and
you may be able to find advice on the Internet about how to do it.  I
can tell you that it's outside the design intent for both Lucene and
Solr.  Lucene works aggressively to *prevent* multiple processes from
sharing an index.

In general, network storage is not a good idea for Solr.  There's added
latency for accessing any data, and frequently the filesystem won't
support the kind of locking that Lucene wants to use, but the biggest
potential problem is disk caching.  Solr/Lucene is absolutely reliant on
disk caching in the SOlr server's local memory for good performance.  If
the network filesystem cannot be cached by the client that has mounted
the storage, which I believe is the case for most network filesystem
types, then you're reliant on disk caching in the network server(s).
For VERY large indexes, which is really the only viable use case I can
imagine for network storage, it is highly unlikely that the network
server(s) will have enough memory to effectively cache the data.

Solr has explicit support for HDFS storage, but as I understand it, HDFS
includes the ability for a client to allocate memory that gets used
exclusively for caching on the client side, which allows HDFS to
function like a local filesystem in ways that I don't think NFS can.
Getting back to my advice about not sharing indexes -- even with
SolrCloud on HDFS, multiple replicas generally do NOT share an index.

A 1GB index is very small, so there's no good reason I can think of to
involve network storage.  I would strongly recommend local storage, and
you should abandon any attempt to share the same index data between more
than one Solr instance.

Thanks,
Shawn



Reply via email to