Shawn ,
1. I will upgrade to 67 JVM shortly .
2. This is a new collection as , I was facing a similar issue in 4.7
and based on Erick's recommendation I updated to 4.10.1 and created a new
collection.
3. Yes, I am hitting the replicas of the same shard and I see the lists
are completely non overlapping.I am using CloudSolrServer to add the
documents.
4. I have a 3 physical node cluster , with each having 16GB in memory.
5. I also have a custom request handler defined in my solrconfig.xml as
below , however I am not using that and I am only using the default select
handler, but my MyCustomHandler class has been been added to the source and
included in the build , but not being used for any requests yet.
<requestHandler name="/mycustomselect" class="solr.MyCustomHandler"
startup="lazy">
<lst name="defaults">
<str name="df">suggestAggregate</str>
<str name="spellcheck.dictionary">direct</str>
<!--<str name="spellcheck.dictionary">wordbreak</str>-->
<str name="spellcheck">on</str>
<str name="spellcheck.extendedResults">true</str>
<str name="spellcheck.count">10</str>
<str name="spellcheck.alternativeTermCount">5</str>
<str name="spellcheck.maxResultsForSuggest">5</str>
<str name="spellcheck.collate">true</str>
<str name="spellcheck.collateExtendedResults">true</str>
<str name="spellcheck.maxCollationTries">10</str>
<str name="spellcheck.maxCollations">5</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
5. The clusterstate.json is copied below
{"dyCollection1":{
"shards":{
"shard1":{
"range":"80000000-d554ffff",
"state":"active",
"replicas":{
"core_node3":{
"state":"active",
"core":"dyCollection1_shard1_replica1",
"node_name":"server3.mydomain.com:8082_solr",
"base_url":"http://server3.mydomain.com:8082/solr"},
"core_node4":{
"state":"active",
"core":"dyCollection1_shard1_replica2",
"node_name":"server2.mydomain.com:8081_solr",
"base_url":"http://server2.mydomain.com:8081/solr",
"leader":"true"}}},
"shard2":{
"range":"d5550000-2aa9ffff",
"state":"active",
"replicas":{
"core_node1":{
"state":"active",
"core":"dyCollection1_shard2_replica1",
"node_name":"server1.mydomain.com:8081_solr",
"base_url":"http://server1.mydomain.com:8081/solr",
"leader":"true"},
"core_node6":{
"state":"active",
"core":"dyCollection1_shard2_replica2",
"node_name":"server3.mydomain.com:8081_solr",
"base_url":"http://server3.mydomain.com:8081/solr"}}},
"shard3":{
"range":"2aaa0000-7fffffff",
"state":"active",
"replicas":{
"core_node2":{
"state":"active",
"core":"dyCollection1_shard3_replica2",
"node_name":"server1.mydomain.com:8082_solr",
"base_url":"http://server1.mydomain.com:8082/solr",
"leader":"true"},
"core_node5":{
"state":"active",
"core":"dyCollection1_shard3_replica1",
"node_name":"server2.mydomain.com:8082_solr",
"base_url":"http://server2.mydomain.com:8082/solr"}}}},
"maxShardsPerNode":"1",
"router":{"name":"compositeId"},
"replicationFactor":"2",
"autoAddReplicas":"false"}}
Thanks!
On Thu, Oct 16, 2014 at 9:02 PM, Shawn Heisey <[email protected]> wrote:
> On 10/16/2014 6:27 PM, S.L wrote:
>
>> 1. Java Version :java version "1.7.0_51"
>> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
>> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
>>
>
> I believe that build 51 is one of those that is known to have bugs related
> to Lucene. If you can upgrade this to 67, that would be good, but I don't
> know that it's a pressing matter. It looks like the Oracle JVM, which is
> good.
>
> 2.OS
>> CentOS Linux release 7.0.1406 (Core)
>>
>> 3. Everything is 64 bit , OS , Java , and CPU.
>>
>> 4. Java Args.
>> -Djava.io.tmpdir=/opt/tomcat1/temp
>> -Dcatalina.home=/opt/tomcat1
>> -Dcatalina.base=/opt/tomcat1
>> -Djava.endorsed.dirs=/opt/tomcat1/endorsed
>> -DzkHost=server1.mydomain.com:2181,server2.mydomain.com:2181,
>> server3.mydomain.com:2181
>> -DzkClientTimeout=20000
>> -DhostContext=solr
>> -Dport=8081
>> -Dhost=server1.mydomain.com
>> -Dsolr.solr.home=/opt/solr/home1
>> -Dfile.encoding=UTF8
>> -Duser.timezone=UTC
>> -XX:+UseG1GC
>> -XX:MaxPermSize=128m
>> -XX:PermSize=64m
>> -Xmx2048m
>> -Xms128m
>> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
>> -Djava.util.logging.config.file=/opt/tomcat1/conf/logging.properties
>>
>
> I would not use the G1 collector myself, but with the heap at only 2GB, I
> don't know that it matters all that much. Even a worst-case collection
> probably is not going to take more than a few seconds, and you've already
> increased the zookeeper client timeout.
>
> http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning
>
> 5. Zookeeper ensemble has 3 zookeeper instances , which are external and
>> are not embedded.
>>
>>
>> 6. Container : I am using Tomcat Apache Tomcat Version 7.0.42
>>
>> *Additional Observations:*
>>
>> I queries all docs on both replicas with distrib=false&fl=id&sort=id+asc,
>> then compared the two lists, I could see by eyeballing the first few lines
>> of ids in both the lists ,I could say that even though each list has equal
>> number of documents i.e 96309 each , but the document ids in them seem to
>> be *mutually exclusive* , , I did not find even a single common id in
>> those lists , I tried at least 15 manually ,it looks like to me that the
>> replicas are disjoint sets.
>>
>
> Are you sure you hit both replicas of the same shard number? If you are,
> then it sounds like something is going wrong with your document routing, or
> maybe your clusterstate is really messed up. Recreating the collection
> from scratch and doing a full reindex might be a good plan ... assuming
> this is possible for you. You could create a whole new collection, and
> then when you're ready to switch, delete the original collection and create
> an alias so your app can still use the old name.
>
> How much total RAM do you have on these systems, and how large are those
> index shards? With a shard having 96K documents, it sounds like your whole
> index is probably just shy of 300K documents.
>
> Thanks,
> Shawn
>
>