Hi again,
More analysis showed that the extraordinarily long query times only
appeared when I specify a sort. A concrete example:
For a querystring such as: ?indent=on&version=2.2&q=apache+user_id%
3A39&start=0&rows=1&fl=*%2Cscore&qt=standard&wt=standard&explainOther=
The QTime is ~500ms.
For a querystring such as: ?indent=on&version=2.2&q=apache+user_id%
3A39&start=0&rows=1&fl=*%
2Cscore&qt=standard&wt=standard&explainOther=&sort=date_added%20asc
The QTime is ~75s
I.e. I am using the StandardRequestHandler to search for a user
entered term ("apache" above) and filtering by a user_id field.
This seems to be the case for every sort option except score asc and
score desc. Please tell me Solr doesn't sort all matching documents
before applying boolean filters?
James
Begin forwarded message:
From: James Brady <[EMAIL PROTECTED]>
Date: 11 February 2008 23:38:16 GMT-08:00
To: solr-user@lucene.apache.org
Subject: Performance help for heavy indexing workload
Hello,
I'm looking for some configuration guidance to help improve
performance of my application, which tends to do a lot more
indexing than searching.
At present, it needs to index around two documents / sec - a
document being the stripped content of a webpage. However,
performance was so poor that I've had to disable indexing of the
webpage content as an emergency measure. In addition, some search
queries take an inordinate length of time - regularly over 60 seconds.
This is running on a medium sized EC2 instance (2 x 2GHz Opterons
and 8GB RAM), and there's not too much else going on on the box. In
total, there are about 1.5m documents in the index.
I'm using a fairly standard configuration - the things I've tried
changing so far have been parameters like maxMergeDocs, mergeFactor
and the autoCommit options. I'm only using the
StandardRequestHandler, no faceting. I have a scheduled task
causing a database commit every 15 seconds.
Obviously, every workload varies, but could anyone comment on
whether this sort of hardware should, with proper configuration, be
able to manage this sort of workload?
I can't see signs of Solr being IO-bound, CPU-bound or memory-
bound, although my scheduled commit operation, or perhaps GC, does
spike up the CPU utilisation at intervals.
Any help appreciated!
James