On 11-Sep-08, at 8:24 AM, Jason Rennie wrote:
We have a 14 million document index that we only use for querying
(optimized, read-only). When we issue queries that have few,
relatively
rare words, the query returns quickly. However, when the query is
longer
and uses more common words (hitting, say, ~1 million docs), it might
take
seconds to return. I'd like to know: what's the bottleneck? It
doesn't
seem to be disk---i/o wait times on the machine are much, much lower
than on
our database servers (e.g. 3% vs. 50%). Our search server is an 8-
core
machine and we do see cpu regularly holding above 100%, so cpu seems
plausible, but would it really take that long to compute scores?
We're using DisMax. There are a number of different fields that we
search
over (5 to be exact). We also have an fq on a single-digit status
field.
Does it make sense that computation time could easily exceed a
second? If
cpu is the bottleneck, is there anything we could do to easily trim-
down
computation time (besides removing common words from the query)?
Are you using pf? phrase queries are much more expensive than term
queries.
If you have a restrictive fq, you might try an approach similar to the
one in https://issues.apache.org/jira/browse/SOLR-407 .
-Mike