See also https://issues.apache.org/jira/browse/SOLR-502 (timeout
searches)
and https://issues.apache.org/jira/browse/LUCENE-997
This is committed on trunk and will be in 1.3. Don't ask me how it
works, b/c I haven't tried it yet, but maybe Sean Timm or someone can
help out. I'm not sure if returns partial results or not.
Also, what kind of caching/warming do you do? How often do these slow
queries appear? Have you profiled your application yet? How many
results are you retrieving?
In some cases, you may just want to figure out how to just return a
cached set of results for your most frequent, slow queries. I mean,
if you know "shirt" is going to retrieve 2 million docs, what
difference does it make if it really has 2 million and 1 documents?
Do the query once, cache the top, oh 1000, and be done. Doesn't even
necessarily need to hit Solr. I know, I know, it's not search, but
most search applications do these kinds of things.
Still, would be nice if there were a little better solution for you.
On Sep 12, 2008, at 2:17 PM, Jason Rennie wrote:
Thanks for all the replies!
Mike: we're not using pf. Our qf is always "status:0". The
"status" field
is "0" for all good docs (90%+) and some other integer for any docs
we don't
want returned.
Jeyrl: federated search is definitely something we'll consider.
On Fri, Sep 12, 2008 at 8:39 AM, Grant Ingersoll
<[EMAIL PROTECTED]>wrote:
The bottleneck may simply be there are a lot of docs to score since
you are
using fairly common terms.
Yeah, I'm coming to the realization that it may be as simple as
that. Even
a short, simple query like "shirt" can take seconds to return,
presumably
because it hits ("numFound") 2 million docs.
Also, what file format (compound, non-compound) are you using? Is it
optimized? Have you profiled your app for these queries? When you
say the
"query is longer", define "longer"... 5 terms? 50 terms? Do you
have lots
of deleted docs? Can you share your DisMax params? Are you doing
wildcard
queries? Can you share the syntax of one of the offending queries?
I think we're using the non-compound format. We see eight different
files
(fdt, fdx, fnm, etc.) in an optimized index. Yes, it's optimized.
It's
also read-only---we don't update/delete. DisMax: we specify qf, fl,
mm, fq;
mm=1; we use boosts for qf. No wildcards. Example query: "shirt";
takes 2
secs to run according to the solr log, hits 2 million docs.
Since you want to keep "stopwords", you might consider a slightly
better
use of them, whereby you use them in n-grams only during query
parsing.
Not sure what you mean here...
See also https://issues.apache.org/jira/browse/LUCENE-494 for related
stuff.
Thanks for the pointer.
Jason
--------------------------
Grant Ingersoll
http://www.lucidimagination.com
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ