Re: Solr4 cluster setup for high performance reads

2013-06-21 Thread Utkarsh Sengar
Thanks for the update guys, I am working on the suggestions shared by you. One last question about the solrcloud setup. What is the recommended cluster size for solrcloud? I have 3 nodes of solr and 3 nodes of ZK (running on the same machine, but a different JVM). And after 2-3 days I notice that

Re: Solr4 cluster setup for high performance reads

2013-06-13 Thread Shawn Heisey
On 6/13/2013 7:51 PM, Utkarsh Sengar wrote: > Sure, I will reduce the count and see how it goes. The problem I have is, > after such a change, I need to reindex everything again, which again is > slow and takes time (40-60hours). There should be no need to reindex after changing most things in sol

Re: Solr4 cluster setup for high performance reads

2013-06-13 Thread Otis Gospodnetic
Hi, Changing cache sizes doesn't require indexing. You have high IO Wait - waiting on your disks? Ideally your index will be cached. Lower those cached, possibly reduce heap size, and leave more RAM to the OS for caching and IO Wait will hopefully go down. I'd try with just -Xmx4g and see. Tha

Re: Solr4 cluster setup for high performance reads

2013-06-13 Thread Utkarsh Sengar
Otis,Shawn, Thanks for reply. You can find my schema.xml and solrconfig.xml here: https://gist.github.com/utkarsh2012/5778811 To answer your questions: Those are massive caches. Rethink their size. More specifically, plug in some monitoring tool and see what you are getting out of them. Just

Re: Solr4 cluster setup for high performance reads

2013-06-13 Thread Shawn Heisey
On 6/13/2013 5:53 PM, Utkarsh Sengar wrote: > *Problems:* > The initial training pulls 2000 documents from solr to find the most > probable matches and calculates score (PMI/NPMI). This query is extremely > slow. Also, a regular query also takes 3-4 seconds. > I am running solr currently on just on

Re: Solr4 cluster setup for high performance reads

2013-06-13 Thread Otis Gospodnetic
Hi, Hard to tell, but here are some tips: * Those are massive caches. Rethink their size. More specifically, plug in some monitoring tool and see what you are getting out of them. Just today I looked at one Sematext's client's caches - 200K entries, 0 evictions ==> needless waste of JVM heap.

Solr4 cluster setup for high performance reads

2013-06-13 Thread Utkarsh Sengar
Hello, I am evaluating solr for indexing about 45M product catalog info. Catalog mainly contains title and description which takes most of the space (other attributes are brand, category, price, etc) The data is stored in cassandra and I am using datastax's solr (DSE 3.0.2) which handles incremen