I've noticed that some of my queries take so long (5 min+) that by the time they return, there is no longer any plausible use for the search results. I've started calling these zombie queries because, well, they should be dead, but they just won't die. Instead, they stick around, wasting my Solr box's CPU, RAM, and I/O resources, and potentially causing more legitimate queries to stack up. (Regarding "stacking up", see SOLR-138.)
I may be able to prevent some of this by optimizing my index settings and by disallowing certain things, such as prefix wildcard queries (e.g. *ar). However, right now I'm most interested in figuring out how to get more robust server-side search timeouts in Solr. This would seem to provide a good balance between these goals: 1) I would like to allow users to attempt to run potentially expensive queries, such as queries with lots of wildcards or ranges 2) I would like to make sure that potentially expensive queries don't turn into zombies -- especially long-lasting zombies For example, I think some of my users might be willing to wait a minute or two for certain classes of search to complete. But after that point, I'd really like to say enough is enough. [Background] While my load is pretty low (it's not a public-facing site), some of my queries are monsters that can take, say, over 5 minutes. (I don't know how much longer than 5 minutes they might take. Some of them might take hours, for all I know, if allowed to run to completion!) The biggest cuprit queries currently seem to be wildcard queries. This is made worse by how I've allowed prefix wildcard searches on an index with a large # of terms. (This is made worse yet by doing word bigram indexing.) I've implemented the "timeAllowed" search time out support feature introduced in SOLR-502, and this does catch some searches that would have become zombies. (Some proximity searches, for example.) But the timeAllowed mechanism does not catch everything. And, as I understand it, it's powerless to do anything about, say, wildcard expansions that are taking forever. The question is how to proceed. [Option 1: Wait for someone to bring timeAllowed support to more parts of Solr search] This might be nice. I sort of assume it will happen eventually. I kind of want a more immediate solution, though. Any thoughts on how hard it would be to add the timeout to, say, wildcard expansion? I haven't figured out if I know enough about Solr yet to work on this myself. [Option 2: Add gross timeout support to StandardRequestHandler?] What if I modified StandardRequestHandler so that, when it was invoked, the following would happen: * spawn new thread t to do the stuff that StandardRequestHandlerStuff would normally do * start thread t * sleep, waiting for either the thread t to finish or for a timer go off * after waking up, look whether the timer went off. if so, then terminate thread t This would kill any runaway zombie queries. But maybe it would also have horrible side-effects. Is it wishful thinking to believe that this might not screw up referencing counting, or create deadlocks, or anything else? [Option 3: Servlet container-level Solutions?] I thought Jetty and friends would might have an option along the lines of "if a request is taking longer than x seconds, then abort the thread handling it". This seems troublesome in practice, though: 1) I can't find a servlet container with documentation clearly stating that this is possible. 2) I played with Jetty, and maxIdleTime sounded like it *might* cause this behavior, but experiments suggest otherwise. 3) This behavior sounds dangerous, especially unless you can convince the servlet container to only abort index-reading threads, while leaving index-writing threads alone. Thanks for any advice, Chris