As I understand your problem, it sounds like you were using your master as part of your search cluster so the two distributed queries were returning conflicting numbers.
In my scenario, our eight Masters are used for /updates & /deletes only. There are no queries issued to these nodes. When the distributed query is executed, it could be possible for the two slaves to be out of sync (i.e. one replicated faster than the other). The proof that may eliminate this is that there is no activity on my servers right now. The search counts have stabilized. It's consistently returning less results as I execute the query with a "start" URL param > 400. On Tue, Jun 19, 2012 at 5:05 PM, Shawn Heisey <s...@elyograg.org> wrote: > On 6/19/2012 2:32 PM, Justin Babuscio wrote: > >> 2) For the shards, we use the URL >> parameters, shards=s1/solr,s2/solr,s3/**solr,...,s8/solr >> where s# point to a baremetal load balancer that routes the requests >> to >> one of the two slave shards. >> > > This most likely has nothing to do with your question about changing > numFound, just a side issue that I wanted to comment on. I was at one time > using a similar method where I had each shard as an entry in the load > balancer. This led to an unusual occasional problem. > > As you may know, a distributed query results in two queries being sent to > each shard -- the first one finds the documents on each shard, then once > Solr has gathered those results, it makes another request that retrieves > the document. > > Imagine that you have just updated your master server, and you make a > query that will include one or more of the new documents in the results. > If you make that query just after the master server gets updated, but > before the slave has had a chance to copy and commit the changes, you can > run into this: The first (search) query goes to the master server and will > see the new document. The second (retrieval) query will then go to the > slave, requesting a document that does not yet exist there. This *will* > happen eventually. I would run into it at least once a day on a monitoring > system that checked the age of the newest document. > > Here's one way to deal with that: I have a dedicated core on each server > that has the shards parameter included in the request handler. This core > does not have an index of its own, it exists only to act as a search > broker, pointing at all the cores with the data. The name of this core is > ncmain, and its standard request handler contains the following: > > <str name="shards">idxa2.example.**com:8981/solr/inclive,idxa1.** > example.com:8981/solr/s0live,**idxa1.example.com:8981/solr/** > s1live,idxa1.example.com:8981/**solr/s2live,idxa2.example.com:** > 8981/solr/s3live,idxa2.**example.com:8981/solr/s4live,** > idxa2.example.com:8981/solr/**s5live<http://idxa2.example.com:8981/solr/inclive,idxa1.example.com:8981/solr/s0live,idxa1.example.com:8981/solr/s1live,idxa1.example.com:8981/solr/s2live,idxa2.example.com:8981/solr/s3live,idxa2.example.com:8981/solr/s4live,idxa2.example.com:8981/solr/s5live> > </str> > > On the servers for chain A (idxa1, idxa2), the shards parameter references > only chain A server cores. On the servers for chain B (idxb1, idxb2), the > shards parameter references only chain B server cores. > > The load balancer only talks to these broker cores, not the cores with the > actual indexes. Neither the client nor the load balancer needs to use (or > even know about) the shards parameter. That is handled entirely within the > Solr configuration. > > Thanks, > Shawn > > -- Justin Babuscio 571-210-0035 http://linchpinsoftware.com