Re: Solr Shard Replicas sharing files

Christian von Wendt-Jensen Thu, 30 Aug 2012 02:02:35 -0700

I was playing with the thought because want to find a way to easily rebalance 
hardware resources. And it takes a lot of time to move replicas around because 
the shards we are serving are many (25 and counting) and large (50GB). If all 
replicas could share a single fileset, it would be so easy to set up new 
replicas and move them around.


We are using Solr 3.6, and are of course aware of the activity on SolrCloud and 
ElasticSearch. My problem is that we are not likely to migrate within the first 
year, and I need a solution for easier administration.

The files could be placed on a SAN with very high bandwidth. I understand that 
a SAN in practice can compete with local harddrives when it comes to bandwidth. 
So in the assumption that I/O is not a bottleneck, what other problems would 
arise?

I can think of some:

 *   The replica uses a file lock effectively preventing other instances to 
access the index. This would be a showstopper in it self, unless there is a 
workaround.
 *   The replicas filelist is in memory, and it is therefore not possible to 
have instant synchronization, as the replica think it should copy the delta 
from the master.




Med venlig hilsen / Best Regards

Christian von Wendt-Jensen
IT Team Lead, Customer Solutions

Infopaq International A/S
Kgs. Nytorv 22
DK-1050 København K

Phone             +45 36 99 00 00
Mobile             +45 31 17 10 07
Email              
christian.sonne.jen...@infopaq.com<mailto:christian.sonne.jen...@infopaq.com>
Web                www.infopaq.com<http://www.infopaq.com/>








DISCLAIMER:
This e-mail and accompanying documents contain privileged confidential 
information. The information is intended only for the recipient(s) named. Any 
unauthorised disclosure, copying, distribution, exploitation or the taking of 
any action in reliance of the content of this e-mail is strictly prohibited. If 
you have received this e-mail in error we would be obliged if you would delete 
the e-mail and attachments and notify the dispatcher by return e-mail or at +45 
36 99 00 00
P Please consider the environment before printing this mail note.

From: Erick Erickson <erickerick...@gmail.com<mailto:erickerick...@gmail.com>>
Reply-To: "solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>" 
<solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>>
Date: Thu, 30 Aug 2012 03:33:09 +0200
To: "solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>" 
<solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>>
Subject: Re: Solr Shard Replicas sharing files

Possible, kinda maybe. But then all of the SolrCloud goodness that's
there for HA/DR goes out the window because the shared index (actually
the hardware it's on) becomes a single point of failure. On the other
hand, you're using the word replica but not explicitly talking about
SolrCloud, so I guess this is just about standard master/slave
situations...

Where the answer is that it's generally not a great idea to share
indexes like this. The disk I/O becomes your bottleneck with all those
slaves asking to pull what then need off the disk at once, every time
it is committed to compounded with network latency.

But I have to ask, is this just a theoretical question or is it really
something you're having trouble with in production?

And the idea of a "replication tree", where N slaves get their index
from the master, then M slaves get their index from the first N slaves
sounds like a "repeater" setup, see:
http://wiki.apache.org/solr/SolrReplication#Setting_up_a_Repeater

Best
Erick

On Wed, Aug 29, 2012 at 4:23 AM, Christian von Wendt-Jensen
<christian.sonne.jen...@infopaq.com<mailto:christian.sonne.jen...@infopaq.com>> 
wrote:
Hi,

I was wondering if it was possible to let all replicas of a shard share the 
physical lucene files. In that way you would only need one set of files on a 
shared storage, and then setup as many replicas as needed without copying files 
around. This would make it very fast to optimize and rebalance hardware 
resources as more shards are added.

What I visioning was a setup with one master doing all the indexing. Then all 
the shard replicas are installed as a string of slaves setup as both master and 
slave, such that the first replica replicates directly from the master. The 
next replica replicates from the first replica and so on.

In this way only the first replica need to write indexfiles. When the next 
replica is triggered to replicate it will find that all files are up to date, 
and then you issue a "commit" to reload the index in memory, thereby being 
up-to-date. The master's commit triggers a cascade of replication, which are 
all up-to-date immediately, and then it is a matter of few seconds for the 
slaves to be in sync with the master.

Taking this though further, the first replica could actually access the 
master's index files directly, and then be up-to-date without copying any files.

Would this setup be possible?



Med venlig hilsen / Best Regards

Christian von Wendt-Jensen
IT Team Lead, Customer Solutions

Infopaq International A/S
Kgs. Nytorv 22
DK-1050 København K

Phone             +45 36 99 00 00
Mobile             +45 31 17 10 07
Email              
christian.sonne.jen...@infopaq.com<mailto:christian.sonne.jen...@infopaq.com><mailto:christian.sonne.jen...@infopaq.com>
Web                www.infopaq.com<http://www.infopaq.com/>

Re: Solr Shard Replicas sharing files

Reply via email to