bq:  Should zookeeper be installed along with solr on each box or should be
      installed in separate 2 Virtual machines by itself?

Zookeeper should be installed on an odd number of machines as it requires
a quorum which is (Number of zookeeper nodes)/2 + 1. With two ZKs,
if either of them fail you fall below quorum so you're actually _more_
likely to fall below quorum than with one ZK. And more generally
more likely to fall below quorum with an even number of ZK nodes than
odd.

But your question seems odd on another level. The number of Zookeepers you
run is entirely independent of the number of Solr nodes. You should just pick a
number of Zookeepers and install them. Unless your installation is large,
3 Zookeepers is usually enough.

Whether they go on their own VMs or not isn't as interesting as
whether they should
go on separate physical boxes. That way if someone pulls a plug on a whole
physical server, then you still have a quorum, whereas if you put two ZKs on a
single physical box you can lose quorum if that one machine goes down.

Now, all that said, it's perfectly possible to run with just a
_single_ zookeeper
running on a single box that's may or may not be running Solr. The risk is that
your cluster will be unable to index documents (but still maybe able to search)
if that ZK becomes unavailable for any reason. I run this way all the time for
development.

In a small installation where all your servers are physically located
together and you run with a single ZK node, if that ZK node regularly becomes
unavailable, you have problems that adding more ZK nodes probably won't help
with ;)

Best,
Erick

On Sat, Dec 5, 2015 at 4:09 AM, Gaurav Patel <gaura...@gmail.com> wrote:
> Thanks Toke.  Your input has been informative and valuable.
> I will go through the links you provided and will let you know what we end
> up going.
>
> On Sat, Dec 5, 2015 at 5:02 AM, Toke Eskildsen <t...@statsbiblioteket.dk>
> wrote:
>
>> Gaurav Patel <gaura...@gmail.com> wrote:
>> > 3 Physical Machines with 60 cpu cores and 512 GB RAM each.
>> > EMC Isilon Appliance with PB storage. It can be accessed via HDFS or NFS.
>>
>> We have experimented a little bit with smaller machines, backed by EMC
>> Isilon over NFS. That worked surprisingly well, but ultimately did not
>> scale for us as we could not justify paying for enterprise SSDs for the
>> Isilon. There is a write-up at
>> https://sbdevel.wordpress.com/2013/12/06/danish-webscale/
>>
>> > Can we use solr cloud for this setup?
>>
>> Yes. That is independent of the backing storage.
>>
>> > How many instances of SOLR are recommended per physical machines
>> > and how much ram should be allocated to it.
>>
>> "That depends".
>>
>> http://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>>
>> The amount of RAM for JVMs should be whatever is needed. Or to put it
>> another way: There are some explicitly configured internal caches in Solr,
>> but just setting Xmx to a very high number will not help performance. On
>> the contrary, it will lead to long garbage collecting pauses and eat from
>> the precious disk cache.
>>
>> There are some rules of thumb for running Solr, but my own meta rule of
>> thumbs is that their applicability goes down when scale goes up. One of the
>> rules of thumb is to have 1 Solr instance per machine. But running JVMs
>> with very large heaps (100GB+) has the potential of extremely long garbage
>> collection pauses and also implies a larger memory overhead due to internal
>> pointer size.
>>
>> > Should zookeeper be installed along with solr on each box or should be
>> > installed in separate 2 Virtual machines by itself?
>>
>> I have no opinion on that.
>>
>> > Can we run kakfa and cassandra along with solr on each physical machine?
>>
>> Sure, but they will of course compete with Solr for resources.
>>
>> > Anybody running Solr with HDFS in production?
>>
>> It is a recurring theme on this mailing list at least. It can be searched
>> at
>> https://www.mail-archive.com/solr-user@lucene.apache.org/
>>
>> - Toke Eskildsen
>>

Reply via email to