Have you filed a JIRA issue for this that I don't remember Markus? We need to make sure this is fixed.
Any idea around when the trunk version came from? Before or after 4.0? - Mark On Dec 14, 2012, at 6:36 AM, Markus Jelsma <markus.jel...@openindex.io> wrote: > We did not solve it but reindexing can remedy the problem. > > -----Original message----- >> From:John Nielsen <j...@mcb.dk> >> Sent: Fri 14-Dec-2012 12:31 >> To: solr-user@lucene.apache.org >> Subject: Re: Strange data-loss problem on one of our cores >> >> How did you solve the problem? >> >> >> -- >> Med venlig hilsen / Best regards >> >> *John Nielsen* >> Programmer >> >> >> >> *MCB A/S* >> Enghaven 15 >> DK-7500 Holstebro >> >> Kundeservice: +45 9610 2824 >> p...@mcb.dk >> www.mcb.dk >> >> >> >> On Fri, Dec 14, 2012 at 12:04 PM, Markus Jelsma >> <markus.jel...@openindex.io>wrote: >> >>> FYI, we observe the same issue, after some time (days, months) a cluster >>> running an older trunk version has at least two shards where the leader and >>> the replica do not contain the same number of records. No recovery is >>> attempted, it seems it thinks everything is alright. Also, one core of one >>> of the unsynced shards waits forever loading >>> /replication?command=detail&wt=json, other cores load it in a few ms. Both >>> cores of another unsynced shard does not show this problem. >>> >>> -----Original message----- >>>> From:John Nielsen <j...@mcb.dk> >>>> Sent: Fri 14-Dec-2012 11:50 >>>> To: solr-user@lucene.apache.org >>>> Subject: Re: Strange data-loss problem on one of our cores >>>> >>>> I did a manual commit, and we are still missing docs, so it doesn't look >>>> like the search race condition you mention. >>>> >>>> My boss wasn't happy when i mentioned that I wanted to try out unreleased >>>> code. Ill get him won over though and return with my findings. It will >>>> probably be some time next week. >>>> >>>> Thanks for your help. >>>> >>>> >>>> -- >>>> Med venlig hilsen / Best regards >>>> >>>> *John Nielsen* >>>> Programmer >>>> >>>> >>>> >>>> *MCB A/S* >>>> Enghaven 15 >>>> DK-7500 Holstebro >>>> >>>> Kundeservice: +45 9610 2824 >>>> p...@mcb.dk >>>> www.mcb.dk >>>> >>>> >>>> >>>> On Thu, Dec 13, 2012 at 4:10 PM, Mark Miller <markrmil...@gmail.com> >>> wrote: >>>> >>>>> Couple things to start: >>>>> >>>>> By default SolrCloud distributes updates a doc at a time. So if you >>> have 1 >>>>> shard, whatever node you index too, it will send updates to the other. >>>>> Replication is only used for recovery, not distributing data. So for >>> some >>>>> reason, there is an IOException when it tries to forward. >>>>> >>>>> The other issue is not something that Ive seen reported. Can/did you >>> try >>>>> and do another hard commit to make sure you had the latest search open >>> when >>>>> checking the # of docs on each node? There was previously a race around >>>>> commit that could cause some issues around expected visibility. >>>>> >>>>> If you are able to, you might try out a nightly build - 4.1 will be >>> ready >>>>> very soon and has numerous bug fixes for SolrCloud. >>>>> >>>>> - Mark >>>>> >>>>> On Dec 13, 2012, at 9:53 AM, John Nielsen <j...@mcb.dk> wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> We are seeing a strange problem on our 2-node solr4 cluster. This >>> problem >>>>>> has resultet in data loss. >>>>>> >>>>>> We have two servers, varnish01 and varnish02. Zookeeper is running on >>>>>> varnish02, but in a separate jvm. >>>>>> >>>>>> We index directly to varnish02 and we read from varnish01. Data is >>> thus >>>>>> replicated from varnish02 to varnish01. >>>>>> >>>>>> I found this in the varnish01 log: >>>>>> >>>>>> *INFO: [default1_Norwegian] webapp=/solr path=/update >>>>> params={distrib.from= >>>>>> >>>>> >>> http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2 >>>>> } >>>>>> status=0 QTime=42 >>>>>> Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update >>>>> params={distrib.from= >>>>>> >>>>> >>> http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2 >>>>> } >>>>>> status=0 QTime=41 >>>>>> Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update >>>>> params={distrib.from= >>>>>> >>>>> >>> http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2 >>>>> } >>>>>> status=0 QTime=33 >>>>>> Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update >>>>> params={distrib.from= >>>>>> >>>>> >>> http://varnish02.lynero.net:8000/solr/default1_Norwegian/&update.distrib=TOLEADER&wt=javabin&version=2 >>>>> } >>>>>> status=0 QTime=33 >>>>>> Dec 13, 2012 12:23:39 PM org.apache.solr.common.SolrException log >>>>>> SEVERE: shard update error StdNode: >>>>>> >>>>> >>> http://varnish02.lynero.net:8000/solr/default1_Norwegian/:org.apache.solr.client.solrj.SolrServerException >>>>> : >>>>>> IOException occured when talking to server at: >>>>>> http://varnish02.lynero.net:8000/solr/default1_Norwegian >>>>>> at >>>>>> >>>>> >>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:413) >>>>>> at >>>>>> >>>>> >>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) >>>>>> at >>>>>> >>>>> >>> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:335) >>>>>> at >>>>>> >>>>> >>> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:309) >>>>>> at >>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) >>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:166) >>>>>> at >>>>>> >>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >>>>>> at >>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) >>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:166) >>>>>> at >>>>>> >>>>> >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) >>>>>> at >>>>>> >>>>> >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) >>>>>> at java.lang.Thread.run(Thread.java:636) >>>>>> Caused by: org.apache.http.NoHttpResponseException: The target server >>>>>> failed to respond >>>>>> at >>>>>> >>>>> >>> org.apache.http.impl.conn.DefaultResponseParser.parseHead(DefaultResponseParser.java:101) >>>>>> at >>>>>> >>>>> >>> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:252) >>>>>> at >>>>>> >>>>> >>> org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:282) >>>>>> at >>>>>> >>>>> >>> org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:247) >>>>>> at >>>>>> >>>>> >>> org.apache.http.impl.conn.AbstractClientConnAdapter.receiveResponseHeader(AbstractClientConnAdapter.java:216) >>>>>> at >>>>>> >>>>> >>> org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:298) >>>>>> at >>>>>> >>>>> >>> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) >>>>>> at >>>>>> >>>>> >>> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:647) >>>>>> at >>>>>> >>>>> >>> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:464) >>>>>> at >>>>>> >>>>> >>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820) >>>>>> at >>>>>> >>>>> >>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754) >>>>>> at >>>>>> >>>>> >>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732) >>>>>> at >>>>>> >>>>> >>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352) >>>>>> ... 11 more >>>>>> >>>>>> Dec 13, 2012 12:23:39 PM >>>>>> org.apache.solr.update.processor.DistributedUpdateProcessor doFinish >>>>>> INFO: try and ask http://varnish02.lynero.net:8000/solr to recover* >>>>>> >>>>>> It looks like it is sending updates from varnish01 to varnish02. I >>> am not >>>>>> sure for what since we only index on varnish02. Updates should never >>> be >>>>>> going from varnish01 to varnish02. >>>>>> >>>>>> Meanwhile on varnish02: >>>>>> >>>>>> *INFO: [default1_Norwegian] webapp=/solr path=/update >>>>> params={distrib.from= >>>>>> >>>>> >>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2 >>>>> } >>>>>> status=0 QTime=16 >>>>>> Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update >>>>> params={distrib.from= >>>>>> >>>>> >>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2 >>>>> } >>>>>> status=0 QTime=15 >>>>>> Dec 13, 2012 12:23:36 PM org.apache.solr.core.SolrCore execute >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update >>>>> params={distrib.from= >>>>>> >>>>> >>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2 >>>>> } >>>>>> status=0 QTime=16 >>>>>> Dec 13, 2012 12:23:42 PM >>> org.apache.solr.handler.admin.CoreAdminHandler >>>>>> handleRequestRecoveryAction >>>>>> INFO: It has been requested that we recover* >>>>>> *Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute >>>>>> INFO: [default1_Danish] webapp=/solr path=/select >>>>>> >>>>> >>> params={facet=false&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url= >>>>>> >>>>> >>> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822111&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text >>>>> >>> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true} >>>>>> status=0 QTime=1 >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute >>>>>> INFO: [default1_Danish] webapp=/solr path=/select/ >>>>>> params={fq=site_guid:(2810678)&q=win} hits=0 status=0 QTime=17 >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute >>>>>> INFO: [default1_Danish] webapp=/solr path=/select >>>>>> >>>>> >>> params={facet=on&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&group.distributed.second=true&version=2&df=text&fl=docid&shard.url= >>>>>> >>>>> >>> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822111&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&qf=%0a++++++++++text >>>>> >>> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true} >>>>>> status=0 QTime=1 >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute >>>>>> INFO: [default1_Danish] webapp=/solr path=/select >>>>>> >>>>> >>> params={facet=false&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url= >>>>>> >>>>> >>> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822138&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text >>>>> >>> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=40&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true} >>>>>> status=0 QTime=1 >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute >>>>>> INFO: [default1_Danish] webapp=/solr path=/select >>>>>> >>>>> >>> params={facet=on&sort=item_group_59700_name_int+asc,+variant_of_item_guid+asc&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&group.distributed.second=true&version=2&df=text&fl=docid&shard.url= >>>>>> >>>>> >>> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397822138&group.field=groupby_variant_of_item_guid&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_59700_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&group.topgroups.groupby_variant_of_item_guid=2963217&group.topgroups.groupby_variant_of_item_guid=2963223&group.topgroups.groupby_variant_of_item_guid=2963219&group.topgroups.groupby_variant_of_item_guid=2963220&group.topgroups.groupby_variant_of_item_guid=2963221&group.topgroups.groupby_variant_of_item_guid=2963222&group.topgroups.groupby_variant_of_item_guid=2963224&group.topgroups.groupby_variant_of_item_guid=2963218&qf=%0a++++++++++text >>>>> >>> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=40&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true} >>>>>> status=0 QTime=1 >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update >>>>> params={distrib.from= >>>>>> >>>>> >>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2 >>>>> } >>>>>> status=0 QTime=26 >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update >>>>> params={distrib.from= >>>>>> >>>>> >>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2 >>>>> } >>>>>> status=0 QTime=22 >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.update.DefaultSolrCoreState >>>>>> doRecovery >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.update.DefaultSolrCoreState >>>>>> doRecovery >>>>>> INFO: Running recovery - first canceling any ongoing recovery >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update >>>>> params={distrib.from= >>>>>> >>>>> >>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2 >>>>> } >>>>>> status=0 QTime=25 >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update >>>>> params={distrib.from= >>>>>> >>>>> >>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2 >>>>> } >>>>>> status=0 QTime=24 >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update >>>>> params={distrib.from= >>>>>> >>>>> >>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2 >>>>> } >>>>>> status=0 QTime=20 >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update >>>>> params={distrib.from= >>>>>> >>>>> >>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2 >>>>> } >>>>>> status=0 QTime=25 >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update >>>>> params={distrib.from= >>>>>> >>>>> >>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2 >>>>> } >>>>>> status=0 QTime=23 >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update >>>>> params={distrib.from= >>>>>> >>>>> >>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2 >>>>> } >>>>>> status=0 QTime=21 >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update >>>>> params={distrib.from= >>>>>> >>>>> >>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2 >>>>> } >>>>>> status=0 QTime=23 >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.core.SolrCore execute >>>>>> INFO: [default1_Norwegian] webapp=/solr path=/update >>>>> params={distrib.from= >>>>>> >>>>> >>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/&update.distrib=FROMLEADER&wt=javabin&version=2 >>>>> } >>>>>> status=0 QTime=16 >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy run >>>>>> INFO: Starting recovery process. core=default1_Norwegian >>>>>> recoveringAfterStartup=false >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.common.cloud.ZkStateReader >>>>>> updateClusterState >>>>>> INFO: Updating cloud state from ZooKeeper... >>>>>> Dec 13, 2012 12:23:42 PM >>>>>> org.apache.solr.update.processor.LogUpdateProcessor finish* >>>>>> >>>>>> And less than a second later: >>>>>> >>>>>> *Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy >>>>> doRecovery >>>>>> INFO: Attempting to PeerSync from >>>>>> >>>>> >>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/core=default1_Norwegian >>>>>> - recoveringAfterStartup=false >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.update.PeerSync sync >>>>>> INFO: PeerSync: core=default1_Norwegian url= >>>>>> http://varnish02.lynero.net:8000/solr START replicas=[ >>>>>> http://varnish01.lynero.net:8000/solr/default1_Norwegian/] >>> nUpdates=100 >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.update.PeerSync sync >>>>>> WARNING: PeerSync: core=default1_Norwegian url= >>>>>> http://varnish02.lynero.net:8000/solr too many updates received >>> since >>>>> start >>>>>> - startingUpdates no longer overlaps with our currentUpdates >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy >>>>> doRecovery >>>>>> INFO: PeerSync Recovery was not successful - trying replication. >>>>>> core=default1_Norwegian >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.cloud.RecoveryStrategy >>>>> doRecovery >>>>>> INFO: Starting Replication Recovery. core=default1_Norwegian >>>>>> Dec 13, 2012 12:23:42 PM >>> org.apache.solr.client.solrj.impl.HttpClientUtil >>>>>> createClient >>>>>> INFO: Creating new http client, >>>>>> >>> config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false >>>>>> Dec 13, 2012 12:23:42 PM org.apache.solr.common.cloud.ZkStateReader$2 >>>>>> process >>>>>> INFO: A cluster state change has occurred - updating...* >>>>>> >>>>>> State change on varnish01 at the same time: >>>>>> >>>>>> *Dec 13, 2012 12:23:42 PM >>> org.apache.solr.common.cloud.ZkStateReader$2 >>>>>> process >>>>>> INFO: A cluster state change has occurred - updating...* >>>>>> * >>>>>> *And a few seconds later on varnish02, the recovery finishes: >>>>>> * >>>>>> Dec 13, 2012 12:23:48 PM org.apache.solr.cloud.RecoveryStrategy >>>>> doRecovery >>>>>> INFO: Replication Recovery was successful - registering as Active. >>>>>> core=default1_Norwegian >>>>>> Dec 13, 2012 12:23:48 PM org.apache.solr.cloud.RecoveryStrategy >>>>> doRecovery >>>>>> INFO: Finished recovery process. core=default1_Norwegian >>>>>> Dec 13, 2012 12:23:48 PM org.apache.solr.core.SolrCore execute >>>>>> INFO: [default1_Danish] webapp=/solr path=/select >>>>>> >>>>> >>> params={facet=false&sort=item_group_56823_name_int+asc,+variant_of_item_guid+asc&group.distributed.first=true&facet.limit=1000&q.alt=*:*&q.alt=*:*&distrib=false&facet.method=enum&version=2&df=text&fl=docid&shard.url= >>>>>> >>>>> >>> varnish02.lynero.net:8000/solr/default1_Danish/|varnish01.lynero.net:8000/solr/default1_Danish/&NOW=1355397828395&group.field=groupby_variant_of_item_guid&facet.field=itemgroups_int_mv&fq=site_guid:(11440)&fq=item_type:(PRODUCT)&fq=language_guid:(1)&fq=item_group_56823_combination:(*)&fq=item_group_45879_combination:(*)&fq=is_searchable:(True)&querytype=Technical&mm=100%25&facet.missing=on&group.ngroups=true&facet.mincount=1&qf=%0a++++++++++text >>>>> >>> ^0.5+name^1.2+searchable_text^0.8+typeahead_text^1.0+keywords^1.1+item_no^5.0%0a++++++++++ranking1_text^1.0+ranking2_text^2.0+ranking3_text^3.0%0a+++++++&wt=javabin&group.facet=true&defType=edismax&rows=0&facet.sort=lex&start=0&group=true&group.sort=name+asc&isShard=true} >>>>>> status=0 QTime=8 >>>>>> Dec 13, 2012 12:23:48 PM org.apache.solr.common.cloud.ZkStateReader >>>>>> updateClusterState >>>>>> INFO: Updating cloud state from ZooKeeper... * >>>>>> >>>>>> Which is picked up on varnish01: >>>>>> >>>>>> *Dec 13, 2012 12:23:48 PM >>> org.apache.solr.common.cloud.ZkStateReader$2 >>>>>> process >>>>>> INFO: A cluster state change has occurred - updating...* >>>>>> >>>>>> It looks like it replicated successfully, only it didnt. The >>>>>> default1_Norwegian core on varnish01 now has 55.071 docs and the same >>>>> core >>>>>> on varnish02 has 35.088 docs. >>>>>> >>>>>> I checked the log files for both JVM's and no stop-the-world GC were >>>>> taking >>>>>> place. >>>>>> >>>>>> There is also nothing in the zookeeper log of interest that I can >>> see. >>>>>> >>>>>> >>>>>> -- >>>>>> Med venlig hilsen / Best regards >>>>>> >>>>>> *John Nielsen* >>>>>> Programmer >>>>>> >>>>>> >>>>>> >>>>>> *MCB A/S* >>>>>> Enghaven 15 >>>>>> DK-7500 Holstebro >>>>>> >>>>>> Kundeservice: +45 9610 2824 >>>>>> p...@mcb.dk >>>>>> www.mcb.dk >>>>> >>>>> >>>> >>> >>