Thanks Toke. Your input has been informative and valuable. I will go through the links you provided and will let you know what we end up going.
On Sat, Dec 5, 2015 at 5:02 AM, Toke Eskildsen <t...@statsbiblioteket.dk> wrote: > Gaurav Patel <gaura...@gmail.com> wrote: > > 3 Physical Machines with 60 cpu cores and 512 GB RAM each. > > EMC Isilon Appliance with PB storage. It can be accessed via HDFS or NFS. > > We have experimented a little bit with smaller machines, backed by EMC > Isilon over NFS. That worked surprisingly well, but ultimately did not > scale for us as we could not justify paying for enterprise SSDs for the > Isilon. There is a write-up at > https://sbdevel.wordpress.com/2013/12/06/danish-webscale/ > > > Can we use solr cloud for this setup? > > Yes. That is independent of the backing storage. > > > How many instances of SOLR are recommended per physical machines > > and how much ram should be allocated to it. > > "That depends". > > http://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ > > The amount of RAM for JVMs should be whatever is needed. Or to put it > another way: There are some explicitly configured internal caches in Solr, > but just setting Xmx to a very high number will not help performance. On > the contrary, it will lead to long garbage collecting pauses and eat from > the precious disk cache. > > There are some rules of thumb for running Solr, but my own meta rule of > thumbs is that their applicability goes down when scale goes up. One of the > rules of thumb is to have 1 Solr instance per machine. But running JVMs > with very large heaps (100GB+) has the potential of extremely long garbage > collection pauses and also implies a larger memory overhead due to internal > pointer size. > > > Should zookeeper be installed along with solr on each box or should be > > installed in separate 2 Virtual machines by itself? > > I have no opinion on that. > > > Can we run kakfa and cassandra along with solr on each physical machine? > > Sure, but they will of course compete with Solr for resources. > > > Anybody running Solr with HDFS in production? > > It is a recurring theme on this mailing list at least. It can be searched > at > https://www.mail-archive.com/solr-user@lucene.apache.org/ > > - Toke Eskildsen >