On 6/19/2012 2:32 PM, Justin Babuscio wrote:
2) For the shards, we use the URL
parameters, shards=s1/solr,s2/solr,s3/solr,...,s8/solr
where s# point to a baremetal load balancer that routes the requests to
one of the two slave shards.
This most likely has nothing to do with your question about changing
numFound, just a side issue that I wanted to comment on. I was at one
time using a similar method where I had each shard as an entry in the
load balancer. This led to an unusual occasional problem.
As you may know, a distributed query results in two queries being sent
to each shard -- the first one finds the documents on each shard, then
once Solr has gathered those results, it makes another request that
retrieves the document.
Imagine that you have just updated your master server, and you make a
query that will include one or more of the new documents in the
results. If you make that query just after the master server gets
updated, but before the slave has had a chance to copy and commit the
changes, you can run into this: The first (search) query goes to the
master server and will see the new document. The second (retrieval)
query will then go to the slave, requesting a document that does not yet
exist there. This *will* happen eventually. I would run into it at
least once a day on a monitoring system that checked the age of the
newest document.
Here's one way to deal with that: I have a dedicated core on each server
that has the shards parameter included in the request handler. This
core does not have an index of its own, it exists only to act as a
search broker, pointing at all the cores with the data. The name of
this core is ncmain, and its standard request handler contains the
following:
<str
name="shards">idxa2.example.com:8981/solr/inclive,idxa1.example.com:8981/solr/s0live,idxa1.example.com:8981/solr/s1live,idxa1.example.com:8981/solr/s2live,idxa2.example.com:8981/solr/s3live,idxa2.example.com:8981/solr/s4live,idxa2.example.com:8981/solr/s5live</str>
On the servers for chain A (idxa1, idxa2), the shards parameter
references only chain A server cores. On the servers for chain B
(idxb1, idxb2), the shards parameter references only chain B server cores.
The load balancer only talks to these broker cores, not the cores with
the actual indexes. Neither the client nor the load balancer needs to
use (or even know about) the shards parameter. That is handled entirely
within the Solr configuration.
Thanks,
Shawn