Erick, Thanks for the suggestion , I am not sure if I would be able to capture what went wrong , so upgrading to 4.10 seems easier even though it means , a days work of effort :) . I will go ahead and upgrade and let me know , although I am surprised that this issue never got reported for 4.7 up until now.
Thanks again for your help! On Mon, Oct 6, 2014 at 10:52 PM, Erick Erickson <erickerick...@gmail.com> wrote: > I think there were some holes that would allow replicas and leaders to > be out of synch that have been patched up in the last 3 releases. > > There shouldn't be anything you need to do to keep these in synch, so > if you can capture what happened when things got out of synch we'll > fix it. But a lot has changed in the last several months, so the first > thing I'd do if possible is to upgrade to 4.10.1. > > > Best, > Erick > > On Mon, Oct 6, 2014 at 2:41 PM, S.L <simpleliving...@gmail.com> wrote: > > Hi Erick, > > > > Before I tried your suggestion of issung a commit=true update, I > realized that for eaach shard there was atleast a node that had its index > directory named like index.<timestamp>. > > > > I went ahead and deleted index directory that restarted that core and > now the index directory got syched with the other node and is properly > named as 'index' without any timestamp attached to it.This is now giving me > consistent results for distrib=true using a load balancer.Also > distrib=false returns expexted results for a given shard. > > > > The underlying issue appears to be that in every shard the leader and > the replica(follower) were out of sych. > > > > How can I avoid this from happening again? > > > > Thanks for your help! > > > > Sent from my HTC > > > > ----- Reply message ----- > > From: "Erick Erickson" <erickerick...@gmail.com> > > To: <solr-user@lucene.apache.org> > > Subject: SolrCloud 4.7 not doing distributed search when querying from a > load balancer. > > Date: Fri, Oct 3, 2014 12:56 AM > > > > Hmmmm. Assuming that you aren't re-indexing the doc you're searching > for... > > > > Try issuing http://blah blah:8983/solr/collection/update?commit=true. > > That'll force all the docs to be searchable. Does <1> still hold for > > the document in question? Because this is exactly backwards of what > > I'd expect. I'd expect, if anything, the replica (I'm trying to call > > it the "follower" when a distinction needs to be made since the leader > > is a "replica" too....) would be out of sync. This is still a Bad > > Thing, but the leader gets first crack at indexing thing. > > > > bq: only the replica of the shard that has this key returns the result > > , and the leader does not , > > > > Just to be sure we're talking about the same thing. When you say > > "leader", you mean the shard leader, right? The filled-in circle on > > the graph view from the admin/cloud page. > > > > And let's see your soft and hard commit settings please. > > > > Best, > > Erick > > > > On Thu, Oct 2, 2014 at 9:48 PM, S.L <simpleliving...@gmail.com> wrote: > >> Eirck, > >> > >> 0> Load balancer is out of the picture > >> . > >> 1>When I query with *distrib=false* , I get consistent results as > expected > >> for those shards that dont have the key i.e I dont get the results back > for > >> those shards, however I just realized that while *distrib=false* is > present > >> in the query for the shard that is supposed to contain the key,only the > >> replica of the shard that has this key returns the result , and the > leader > >> does not , looks like replica and the leader do not have the same data > and > >> replica seems to contain the key in the query for that shard. > >> > >> 2> By indexing I mean this collection is being populated by a web > crawler. > >> > >> So looks like 1> above is pointing to leader and replica being out of > >> synch for atleast one shard. > >> > >> > >> > >> On Thu, Oct 2, 2014 at 11:57 PM, Erick Erickson < > erickerick...@gmail.com> > >> wrote: > >> > >>> bq: Also ,the collection is being actively indexed as I query this, > could > >>> that > >>> be an issue too ? > >>> > >>> Not if the documents you're searching aren't being added as you search > >>> (and all your autocommit intervals have expired). > >>> > >>> I would turn off indexing for testing, it's just one more variable > >>> that can get in the way of understanding this. > >>> > >>> Do note that if the problem were endemic to Solr, there would probably > >>> be a _lot_ more noise out there. > >>> > >>> So to recap: > >>> 0> we can take the load balancer out of the picture all together. > >>> > >>> 1> when you query each shard individually with &distrib=true, every > >>> replica in a particular shard returns the same count. > >>> > >>> 2> when you query without &distrib=true you get varying counts. > >>> > >>> This is very strange and not at all expected. Let's try it again > >>> without indexing going on.... > >>> > >>> And what do you mean by "indexing" anyway? How are documents being fed > >>> to your system? > >>> > >>> Best, > >>> Erick@PuzzledAsWell > >>> > >>> On Thu, Oct 2, 2014 at 7:32 PM, S.L <simpleliving...@gmail.com> wrote: > >>> > Erick, > >>> > > >>> > I would like to add that the interesting behavior i.e point #2 that I > >>> > mentioned in my earlier reply happens in all the shards , if this > were > >>> to > >>> > be a distributed search issue this should have not manifested itself > in > >>> the > >>> > shard that contains the key that I am searching for , looks like the > >>> search > >>> > is just failing as whole intermittently . > >>> > > >>> > Also ,the collection is being actively indexed as I query this, could > >>> that > >>> > be an issue too ? > >>> > > >>> > Thanks. > >>> > > >>> > On Thu, Oct 2, 2014 at 10:24 PM, S.L <simpleliving...@gmail.com> > wrote: > >>> > > >>> >> Erick, > >>> >> > >>> >> Thanks for your reply, I tried your suggestions. > >>> >> > >>> >> 1 . When not using loadbalancer if *I have distrib=false* I get > >>> >> consistent results across the replicas. > >>> >> > >>> >> 2. However here's the insteresting part , while not using load > balancer > >>> if > >>> >> I *dont have distrib=false* , then when I query a particular node > ,I get > >>> >> the same behaviour as if I were using a loadbalancer , meaning the > >>> >> distributed search from a node works intermittently .Does this give > any > >>> >> clue ? > >>> >> > >>> >> > >>> >> > >>> >> On Thu, Oct 2, 2014 at 7:47 PM, Erick Erickson < > erickerick...@gmail.com > >>> > > >>> >> wrote: > >>> >> > >>> >>> Hmmm, nothing quite makes sense here.... > >>> >>> > >>> >>> Here are some experiments: > >>> >>> 1> avoid the load balancer and issue queries like > >>> >>> http://solr_server:8983/solr/collection/q=whatever&distrib=false > >>> >>> > >>> >>> the &distrib=false bit will cause keep SolrCloud from trying to > send > >>> >>> the queries anywhere, they'll be served only from the node you > address > >>> >>> them to. > >>> >>> that'll help check whether the nodes are consistent. You should be > >>> >>> getting back the same results from each replica in a shard (i.e. 2 > of > >>> >>> your 6 machines). > >>> >>> > >>> >>> Next, try your failing query the same way. > >>> >>> > >>> >>> Next, try your failing query from a browser, pointing it at > successive > >>> >>> nodes. > >>> >>> > >>> >>> Where is the first place problems show up? > >>> >>> > >>> >>> My _guess_ is that your load balancer isn't quite doing what you > >>> think, or > >>> >>> your cluster isn't set up the way you think it is, but those are > >>> guesses. > >>> >>> > >>> >>> Best, > >>> >>> Erick > >>> >>> > >>> >>> On Thu, Oct 2, 2014 at 2:51 PM, S.L <simpleliving...@gmail.com> > wrote: > >>> >>> > Hi All, > >>> >>> > > >>> >>> > I am trying to query a 6 node Solr4.7 cluster with 3 shards > and a > >>> >>> > replication factor of 2 . > >>> >>> > > >>> >>> > I have fronted these 6 Solr nodes using a load balancer , what I > >>> notice > >>> >>> is > >>> >>> > that every time I do a search of the form > >>> >>> > q=*:*&fq=(id:9e78c064-919f-4ef3-b236-dc66351b4acf) it gives me a > >>> result > >>> >>> > only once in every 3 tries , telling me that the load balancer is > >>> >>> > distributing the requests between the 3 shards and SolrCloud only > >>> >>> returns a > >>> >>> > result if the request goes to the core that as that id . > >>> >>> > > >>> >>> > However if I do a simple search like q=*:* , I consistently get > the > >>> >>> right > >>> >>> > aggregated results back of all the documents across all the > shards > >>> for > >>> >>> > every request from the load balancer. Can someone please let me > know > >>> >>> what > >>> >>> > this is symptomatic of ? > >>> >>> > > >>> >>> > Somehow Solr Cloud seems to be doing search query distribution > and > >>> >>> > aggregation for queries of type *:* only. > >>> >>> > > >>> >>> > Thanks. > >>> >>> > >>> >> > >>> >> > >>> >