I was playing with the thought because want to find a way to easily rebalance hardware resources. And it takes a lot of time to move replicas around because the shards we are serving are many (25 and counting) and large (50GB). If all replicas could share a single fileset, it would be so easy to set up new replicas and move them around.
We are using Solr 3.6, and are of course aware of the activity on SolrCloud and ElasticSearch. My problem is that we are not likely to migrate within the first year, and I need a solution for easier administration. The files could be placed on a SAN with very high bandwidth. I understand that a SAN in practice can compete with local harddrives when it comes to bandwidth. So in the assumption that I/O is not a bottleneck, what other problems would arise? I can think of some: * The replica uses a file lock effectively preventing other instances to access the index. This would be a showstopper in it self, unless there is a workaround. * The replicas filelist is in memory, and it is therefore not possible to have instant synchronization, as the replica think it should copy the delta from the master. Med venlig hilsen / Best Regards Christian von Wendt-Jensen IT Team Lead, Customer Solutions Infopaq International A/S Kgs. Nytorv 22 DK-1050 København K Phone +45 36 99 00 00 Mobile +45 31 17 10 07 Email christian.sonne.jen...@infopaq.com<mailto:christian.sonne.jen...@infopaq.com> Web www.infopaq.com<http://www.infopaq.com/> DISCLAIMER: This e-mail and accompanying documents contain privileged confidential information. The information is intended only for the recipient(s) named. Any unauthorised disclosure, copying, distribution, exploitation or the taking of any action in reliance of the content of this e-mail is strictly prohibited. If you have received this e-mail in error we would be obliged if you would delete the e-mail and attachments and notify the dispatcher by return e-mail or at +45 36 99 00 00 P Please consider the environment before printing this mail note. From: Erick Erickson <erickerick...@gmail.com<mailto:erickerick...@gmail.com>> Reply-To: "solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>" <solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>> Date: Thu, 30 Aug 2012 03:33:09 +0200 To: "solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>" <solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>> Subject: Re: Solr Shard Replicas sharing files Possible, kinda maybe. But then all of the SolrCloud goodness that's there for HA/DR goes out the window because the shared index (actually the hardware it's on) becomes a single point of failure. On the other hand, you're using the word replica but not explicitly talking about SolrCloud, so I guess this is just about standard master/slave situations... Where the answer is that it's generally not a great idea to share indexes like this. The disk I/O becomes your bottleneck with all those slaves asking to pull what then need off the disk at once, every time it is committed to compounded with network latency. But I have to ask, is this just a theoretical question or is it really something you're having trouble with in production? And the idea of a "replication tree", where N slaves get their index from the master, then M slaves get their index from the first N slaves sounds like a "repeater" setup, see: http://wiki.apache.org/solr/SolrReplication#Setting_up_a_Repeater Best Erick On Wed, Aug 29, 2012 at 4:23 AM, Christian von Wendt-Jensen <christian.sonne.jen...@infopaq.com<mailto:christian.sonne.jen...@infopaq.com>> wrote: Hi, I was wondering if it was possible to let all replicas of a shard share the physical lucene files. In that way you would only need one set of files on a shared storage, and then setup as many replicas as needed without copying files around. This would make it very fast to optimize and rebalance hardware resources as more shards are added. What I visioning was a setup with one master doing all the indexing. Then all the shard replicas are installed as a string of slaves setup as both master and slave, such that the first replica replicates directly from the master. The next replica replicates from the first replica and so on. In this way only the first replica need to write indexfiles. When the next replica is triggered to replicate it will find that all files are up to date, and then you issue a "commit" to reload the index in memory, thereby being up-to-date. The master's commit triggers a cascade of replication, which are all up-to-date immediately, and then it is a matter of few seconds for the slaves to be in sync with the master. Taking this though further, the first replica could actually access the master's index files directly, and then be up-to-date without copying any files. Would this setup be possible? Med venlig hilsen / Best Regards Christian von Wendt-Jensen IT Team Lead, Customer Solutions Infopaq International A/S Kgs. Nytorv 22 DK-1050 København K Phone +45 36 99 00 00 Mobile +45 31 17 10 07 Email christian.sonne.jen...@infopaq.com<mailto:christian.sonne.jen...@infopaq.com><mailto:christian.sonne.jen...@infopaq.com> Web www.infopaq.com<http://www.infopaq.com/>