Thanks for all the replies!

Mike: we're not using pf.  Our qf is always "status:0".  The "status" field
is "0" for all good docs (90%+) and some other integer for any docs we don't
want returned.

Jeyrl: federated search is definitely something we'll consider.

On Fri, Sep 12, 2008 at 8:39 AM, Grant Ingersoll <[EMAIL PROTECTED]>wrote:

 The bottleneck may simply be there are a lot of docs to score since you are
 using fairly common terms.

Yeah, I'm coming to the realization that it may be as simple as that.  Even
a short, simple query like "shirt" can take seconds to return, presumably
because it hits ("numFound") 2 million docs.


 Also, what file format (compound, non-compound) are you using?  Is it
 optimized?  Have you profiled your app for these queries?  When you say the
 "query is longer", define "longer"...  5 terms?  50 terms?  Do you have lots
 of deleted docs?  Can you share your DisMax params?  Are you doing wildcard
 queries?  Can you share the syntax of one of the offending queries?


I think we're using the non-compound format.  We see eight different files
(fdt, fdx, fnm, etc.) in an optimized index.  Yes, it's optimized.  It's
also read-only---we don't update/delete.  DisMax: we specify qf, fl, mm, fq;
mm=1; we use boosts for qf.  No wildcards.  Example query: "shirt"; takes 2
secs to run according to the solr log, hits 2 million docs.


 > Since you want to keep "stopwords", you might consider a slightly better
 use of them, whereby you use them in n-grams only during query parsing.


Not sure what you mean here...

You might want to look at how Nutch handles this issue. Nutch also has stopwords that it wants to keep around. So what it does is generates combo terms like the-<next term> in the index. The query parser does the same thing, so that if your query phrase has common terms, you wind up searching across a much smaller slice of your index.

This comes, of course, at the expense of a larger index with a lot more unique terms (due to all of the combo terms).

But this can be a big win - for example, at our site (http://www.krugle.org) we index source files. Without this optimization, searches could take several seconds. With it, we got down to < 100ms with lots of breathing room.

-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"If you can't find it, you can't fix it"

Reply via email to