On 4/8/2014 6:00 PM, Utkarsh Sengar wrote: > Lots of questions indeed :) > > 1. Total virtual machines: 3 > 2. Replication factor: 0 (don't have any replicas yet) > 3. Each machine has 1 shard which has 20GB of data. So data for a > collection is spread across 3 machines totalling to 60GB > 4. Start solr: > java -Xmx10000m > -javaagent:newrelic/newrelic.jar > -Dsolr.clustering.enabled=true > -Dsolr.solr.home=multicore > -Djetty.class.path=lib/ext/* " > -Dbootstrap_conf=true > -DnumShards=3 > -DzkHost=localhost:2181 -jar start.jar" > 5. Yes, all machines have 24GB RAM and 9GB heap. Separate process of ZK is > running on these machine. > 6. top screenshot: http://i.imgur.com/g6w9Bim.png
A followup question: What vendor and version of JVM are you running? Excellent choices include very recent Java 6 releases from Oracle, Oracle Java 7u25, and whatever OpenJDK version corresponds to Oracle 7u25. Good choices include most version of Oracle Java 7, Oracle Java 6, and OpenJDK7. The latest versions of Oracle Java 7 (from 7u40 to 7u51) have known bugs that affect Solr. OpenJDK6 and commercial java versions from non-Oracle vendors like IBM are very bad choices, because they have known serious bugs. I don't know much about the Zing JVM, but it is probably a good choice. If you are running Zing, then what I'm saying below about GC pauses will not apply. Solr 4.8 will require Java 7, so if you plan to upgrade that far, be sure you're not using Java 6 at all. One possible problem that I always investigate first is whether or not there's enough RAM to cache the index effectively. The 14GB of RAM in your disk cache is not a perfect setup for a 20GB index, but it should be plenty. The fact that you still have 4GB of RAM free on your top screenshot is further evidence that you do have plenty of disk cache. No need to pursue that any further. Garbage collection pauses are however a likely problem here. I have some personal experience with this problem. Because you're using the default collector and have 7GB heap allocated, I can almost guarantee that this is a problem, even if New Relic isn't showing it. A program called jHiccup *will* show the problem. http://www.azulsystems.com/jHiccup These are my GC settings. They work very well and are not specific to a certain heap size, although I am sure that the config can be improved: http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning Regarding zookeeper: Are you running all three of your ZK instances in a redundant ensemble, where the config on each of them knows about all of them? You should definitely be doing this. If you are, then your zkHost parameter for Solr needs to reflect that: -DzkHost=host1:2181,host2:2181,host3:2181 Using only localhost:2181 could cause problems, and they could look like the problems you are seeing. Thanks, Shawn