Hi, I have opened a couple of jira's, one to make the HttpShardHandlerFactory and LBHttpSolrServer more easily extended: https://issues.apache.org/jira/browse/SOLR-4448 and one with an implementation of a backup requesting load balancer : https://issues.apache.org/jira/browse/SOLR-4449 .
The implementation does not attempt to cancel inflight requests if a successful response is received, in fact it returns the successful response immediately then allows the inflight requests to complete. That way it can detect 'zombie' servers in a way similar to the current load balancer and not send them requests for a specified time. Phil -----Original Message----- From: Jeff Wartes [mailto:jwar...@whitepages.com] Sent: 01 February 2013 01:51 To: solr-user@lucene.apache.org Subject: RE: Solr load balancer For what it's worth, Google has done some pretty interesting research into coping with the idea that particular shards might very well be busy doing something else when your query comes in. Check out this slide deck: http://research.google.com/people/jeff/latency.html Lots of interesting ideas, but in particular, around slide 39 he talks about "backup requests" where you wait for something like your typical response time and then issue a second request to a different shard. You take whichever answer you get first, and cancel the other. The initial wait + cancellation means your extra cluster load is minimal, and you still get the benefit of reducing your p95+ response times if the first request was high-latency due to something unrelated to the query. (Say, GC.) Of course, a central principle of this approach is being able to cancel a query and have it stop consuming resources. I'd love to be corrected, but I don't think Solr allows this. You can stop waiting for a response, but even the timeAllowed param doesn't seem to stop resource usage after the allotted time. Meaning, a few exceptionally long-running queries can take out your high-throughput cluster by tying up entire CPUs for long periods. Let me know the JIRA number, I'd love to see work in this area. -----Original Message----- From: Phil Hoy [mailto:p...@brightsolid.com] Sent: Tuesday, January 29, 2013 11:33 AM To: solr-user@lucene.apache.org Subject: RE: Solr load balancer Hi Erick, Thanks, I have read the blogs you cited and I found them very interesting, and we have tuned the jvm accordingly but still we get the odd longish gc pause. That said we perhaps have an unusual setup; we index a lot of small documents using servers with ssd's and 128 GB RAM in a sharded set up with replicas and our queries rely heavily on query filters and faceting with minimal free-text style searching. For that reason we rely heavily on the filter cache to improve query latency, therefore we assign a large percentage of available ram to the jvm hosting solr. Anyhow we are happy with the current configuration and performance profile, aside from the odd gc pause that is, and as we have index replicas it seems to me that we should be able to cope, hence my willingness to tweak how the load balancer behaves. Thanks, Phil -----Original Message----- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 20 January 2013 15:56 To: solr-user@lucene.apache.org Subject: Re: Solr load balancer Hmmm, the first thing I'd look at is why you are having long GC pauses. Here's a great place to start: http://www.lucidimagination.com/blog/2011/03/27/garbage-collection-bootcamp-1-0/ and: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html I've wondered about a similar approach, but by firing off the same query to multiple nodes in your cluster, you'll be effectively doubling (at least) the load on your system. Leading to more memory issues perhaps in a "non-virtuous cycle". FWIW, Erick On Fri, Jan 18, 2013 at 5:41 AM, Phil Hoy <p...@brightsolid.com> wrote: > Hi, > > I would like to experiment with some custom load balancers to help with query > latency in the face of long gc pauses and the odd time-consuming query that > we need to be able to support. At the moment setting the socket timeout via > the HttpShardHandlerFactory does help, but of course it can only be set to a > length of time as long as the most time consuming query we are likely to > receive. > > For example perhaps a load balancer that sends multiple queries concurrently > to all/some replicas and only keeps the first response might be effective. Or > maybe a load balancer which takes account of the frequency of timeouts would > be able to recognize zombies more effectively. > > To use alternative load balancer implementations cleanly and without having > to hack solr directly, I would need to be able to make the existing > LBHttpSolrServer and HttpShardHandlerFactory more amenable to extension, I > can then override the default load balancer using solr's plugin mechanism. > > So my question is, if I made a patch to make the load balancer more > pluggable, is this something that would be acceptable and if so what do I do > next? > > Phil > > ______________________________________________________________________ > "brightsolid" is used in this email to collectively mean brightsolid online > innovation limited and its subsidiary companies brightsolid online publishing > limited and brightsolid online technology limited. > findmypast.co.uk is a brand of brightsolid online publishing limited. > brightsolid online innovation limited, Gateway House, Luna Place, Dundee > Technology Park, Dundee DD2 1TP. Registered in Scotland No. SC274983. > brightsolid online publishing limited, The Glebe, 6 Chapel Place, Rivington > Street, London EC2A 3DQ. Registered in England No. 04369607. > brightsolid online technology limited, Gateway House, Luna Place, Dundee > Technology Park, Dundee DD2 1TP. Registered in Scotland No. SC161678. > > Email Disclaimer > > This message is confidential and may contain privileged information. You > should not disclose its contents to any other person. If you are not the > intended recipient, please notify the sender named above immediately. It is > expressly declared that this e-mail does not constitute nor form part of a > contract or unilateral obligation. Opinions, conclusions and other > information in this message that do not relate to the official business of > brightsolid shall be understood as neither given nor endorsed by it. > ______________________________________________________________________ > This email has been scanned by the brightsolid Email Security System. > Powered by MessageLabs > ______________________________________________________________________ ______________________________________________________________________ This email has been scanned by the brightsolid Email Security System. Powered by MessageLabs ______________________________________________________________________ ______________________________________________________________________ "brightsolid" is used in this email to collectively mean brightsolid online innovation limited and its subsidiary companies brightsolid online publishing limited and brightsolid online technology limited. findmypast.co.uk is a brand of brightsolid online publishing limited. brightsolid online innovation limited, Gateway House, Luna Place, Dundee Technology Park, Dundee DD2 1TP. Registered in Scotland No. SC274983. brightsolid online publishing limited, The Glebe, 6 Chapel Place, Rivington Street, London EC2A 3DQ. Registered in England No. 04369607. brightsolid online technology limited, Gateway House, Luna Place, Dundee Technology Park, Dundee DD2 1TP. Registered in Scotland No. SC161678. Email Disclaimer This message is confidential and may contain privileged information. You should not disclose its contents to any other person. If you are not the intended recipient, please notify the sender named above immediately. It is expressly declared that this e-mail does not constitute nor form part of a contract or unilateral obligation. Opinions, conclusions and other information in this message that do not relate to the official business of brightsolid shall be understood as neither given nor endorsed by it. ______________________________________________________________________ This email has been scanned by the brightsolid Email Security System. Powered by MessageLabs ______________________________________________________________________