Well, the *first* sort to the underlying Lucene engine is expensive since it builds up the terms to sort. I wonder if you're closing and opening the underlying searcher for every request? This is a definite limiter.
Disclaimer: I mostly do Lucene, not SOLR (yet), so don't *even* ask me how to change this behavior <G>. But your comment about frequent updates to the index prompted this question.... Best Erick On Feb 12, 2008 3:54 AM, James Brady <[EMAIL PROTECTED]> wrote: > Hi again, > More analysis showed that the extraordinarily long query times only > appeared when I specify a sort. A concrete example: > > For a querystring such as: ?indent=on&version=2.2&q=apache+user_id% > 3A39&start=0&rows=1&fl=*%2Cscore&qt=standard&wt=standard&explainOther= > The QTime is ~500ms. > For a querystring such as: ?indent=on&version=2.2&q=apache+user_id% > 3A39&start=0&rows=1&fl=*% > 2Cscore&qt=standard&wt=standard&explainOther=&sort=date_added%20asc > The QTime is ~75s > > I.e. I am using the StandardRequestHandler to search for a user > entered term ("apache" above) and filtering by a user_id field. > > This seems to be the case for every sort option except score asc and > score desc. Please tell me Solr doesn't sort all matching documents > before applying boolean filters? > > James > > Begin forwarded message: > > > From: James Brady <[EMAIL PROTECTED]> > > Date: 11 February 2008 23:38:16 GMT-08:00 > > To: solr-user@lucene.apache.org > > Subject: Performance help for heavy indexing workload > > > > Hello, > > I'm looking for some configuration guidance to help improve > > performance of my application, which tends to do a lot more > > indexing than searching. > > > > At present, it needs to index around two documents / sec - a > > document being the stripped content of a webpage. However, > > performance was so poor that I've had to disable indexing of the > > webpage content as an emergency measure. In addition, some search > > queries take an inordinate length of time - regularly over 60 seconds. > > > > This is running on a medium sized EC2 instance (2 x 2GHz Opterons > > and 8GB RAM), and there's not too much else going on on the box. In > > total, there are about 1.5m documents in the index. > > > > I'm using a fairly standard configuration - the things I've tried > > changing so far have been parameters like maxMergeDocs, mergeFactor > > and the autoCommit options. I'm only using the > > StandardRequestHandler, no faceting. I have a scheduled task > > causing a database commit every 15 seconds. > > > > Obviously, every workload varies, but could anyone comment on > > whether this sort of hardware should, with proper configuration, be > > able to manage this sort of workload? > > > > I can't see signs of Solr being IO-bound, CPU-bound or memory- > > bound, although my scheduled commit operation, or perhaps GC, does > > spike up the CPU utilisation at intervals. > > > > Any help appreciated! > > James > >