Well, the *first* sort to the underlying Lucene engine is expensive since
it builds up the terms to sort. I wonder if you're closing and opening the
underlying searcher for every request? This is a definite limiter.

Disclaimer: I mostly do Lucene, not SOLR (yet), so don't *even* ask
me how to change this behavior <G>. But your comment about
frequent updates to the index prompted this question....

Best
Erick

On Feb 12, 2008 3:54 AM, James Brady <[EMAIL PROTECTED]> wrote:

> Hi again,
> More analysis showed that the extraordinarily long query times only
> appeared when I specify a sort. A concrete example:
>
> For a querystring such as: ?indent=on&version=2.2&q=apache+user_id%
> 3A39&start=0&rows=1&fl=*%2Cscore&qt=standard&wt=standard&explainOther=
> The QTime is ~500ms.
> For a querystring such as: ?indent=on&version=2.2&q=apache+user_id%
> 3A39&start=0&rows=1&fl=*%
> 2Cscore&qt=standard&wt=standard&explainOther=&sort=date_added%20asc
> The QTime is ~75s
>
> I.e. I am using the StandardRequestHandler to search for a user
> entered term ("apache" above) and filtering by a user_id field.
>
> This seems to be the case for every sort option except score asc and
> score desc. Please tell me Solr doesn't sort all matching documents
> before applying boolean filters?
>
> James
>
> Begin forwarded message:
>
> > From: James Brady <[EMAIL PROTECTED]>
> > Date: 11 February 2008 23:38:16 GMT-08:00
> > To: solr-user@lucene.apache.org
> > Subject: Performance help for heavy indexing workload
> >
> > Hello,
> > I'm looking for some configuration guidance to help improve
> > performance of my application, which tends to do a lot more
> > indexing than searching.
> >
> > At present, it needs to index around two documents / sec - a
> > document being the stripped content of a webpage. However,
> > performance was so poor that I've had to disable indexing of the
> > webpage content as an emergency measure. In addition, some search
> > queries take an inordinate length of time - regularly over 60 seconds.
> >
> > This is running on a medium sized EC2 instance (2 x 2GHz Opterons
> > and 8GB RAM), and there's not too much else going on on the box. In
> > total, there are about 1.5m documents in the index.
> >
> > I'm using a fairly standard configuration - the things I've tried
> > changing so far have been parameters like maxMergeDocs, mergeFactor
> > and the autoCommit options. I'm only using the
> > StandardRequestHandler, no faceting. I have a scheduled task
> > causing a database commit every 15 seconds.
> >
> > Obviously, every workload varies, but could anyone comment on
> > whether this sort of hardware should, with proper configuration, be
> > able to manage this sort of workload?
> >
> > I can't see signs of Solr being IO-bound, CPU-bound or memory-
> > bound, although my scheduled commit operation, or perhaps GC, does
> > spike up the CPU utilisation at intervals.
> >
> > Any help appreciated!
> > James
>
>

Reply via email to