On 10/16/2014 6:27 PM, S.L wrote:
1. Java Version :java version "1.7.0_51"
Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
I believe that build 51 is one of those that is known to have bugs
related to Lucene. If you can upgrade this to 67, that would be good,
but I don't know that it's a pressing matter. It looks like the Oracle
JVM, which is good.
2.OS
CentOS Linux release 7.0.1406 (Core)
3. Everything is 64 bit , OS , Java , and CPU.
4. Java Args.
-Djava.io.tmpdir=/opt/tomcat1/temp
-Dcatalina.home=/opt/tomcat1
-Dcatalina.base=/opt/tomcat1
-Djava.endorsed.dirs=/opt/tomcat1/endorsed
-DzkHost=server1.mydomain.com:2181,server2.mydomain.com:2181,
server3.mydomain.com:2181
-DzkClientTimeout=20000
-DhostContext=solr
-Dport=8081
-Dhost=server1.mydomain.com
-Dsolr.solr.home=/opt/solr/home1
-Dfile.encoding=UTF8
-Duser.timezone=UTC
-XX:+UseG1GC
-XX:MaxPermSize=128m
-XX:PermSize=64m
-Xmx2048m
-Xms128m
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
-Djava.util.logging.config.file=/opt/tomcat1/conf/logging.properties
I would not use the G1 collector myself, but with the heap at only 2GB,
I don't know that it matters all that much. Even a worst-case
collection probably is not going to take more than a few seconds, and
you've already increased the zookeeper client timeout.
http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning
5. Zookeeper ensemble has 3 zookeeper instances , which are external and
are not embedded.
6. Container : I am using Tomcat Apache Tomcat Version 7.0.42
*Additional Observations:*
I queries all docs on both replicas with distrib=false&fl=id&sort=id+asc,
then compared the two lists, I could see by eyeballing the first few lines
of ids in both the lists ,I could say that even though each list has equal
number of documents i.e 96309 each , but the document ids in them seem to
be *mutually exclusive* , , I did not find even a single common id in
those lists , I tried at least 15 manually ,it looks like to me that the
replicas are disjoint sets.
Are you sure you hit both replicas of the same shard number? If you
are, then it sounds like something is going wrong with your document
routing, or maybe your clusterstate is really messed up. Recreating the
collection from scratch and doing a full reindex might be a good plan
... assuming this is possible for you. You could create a whole new
collection, and then when you're ready to switch, delete the original
collection and create an alias so your app can still use the old name.
How much total RAM do you have on these systems, and how large are those
index shards? With a shard having 96K documents, it sounds like your
whole index is probably just shy of 300K documents.
Thanks,
Shawn