On 10/11/2010 6:32 PM, Peter Keegan wrote:
When Solr does a distributed search across shards, it does this in 2 phases
(correct me if I'm wrong):

1. 1st query to get the docIds and facet counts
2. 2nd query to retrieve the stored fields of the top hits

The problem here is that the index could change between (1) and (2), so it's
not an atomic transaction. If the stored fields were kept outside of Lucene,
only the first query would be necessary. However, this would mean that the
external NoSQL data store would have to be synchronized with the Lucene
index, which might present its own problems. (I'm just throwing this out for
discussion)

I've got a related issue that I have run into because of my use of a load balancer.

I have a total of seven shards, each of which has a replica. I've got one set of machines set up as brokers that have the shards parameter in the standard request handler. Queries are sent to the load balancer, which sends it to one of the brokers. The shards parameter sends requests back to the load balancer to be ultimately sent to an actual server.

I have a monitoring script that retrieves the latest document and alarms if it's older than ten minutes. Something that happens on occasion:

1) An update is made to the master (happens every two minutes).
2) Monitoring script requests newest document.
3) Initial request is sent to master, finds ID.
4) Second request is sent to the slave, document not found.
5) Up to 15 seconds later, the slave replicates.

I solved this problem by having the monitoring script try several times on failure, waiting a few seconds on each loop. Do I need to be terribly concerned about this impacting real queries?

I do not actually need to load balance, I have slave servers purely for failover. Currently the load balancer has a 3 to 1 weight ratio favoring the slaves, which I plan to increase. At one time I had the master set up as a backup rather than a lower weight target, but haproxy seemed to take longer to recover from failures in that mode. I will have to do some more comprehensive testing. If there's a better solution than haproxy that works with heartbeat, I can change that.

Thanks,
Shawn

Reply via email to