That does seem really slow. Is the index on NFS-mounted storage?
wunder
On 2/12/08 7:04 AM, "Erick Erickson" <[EMAIL PROTECTED]> wrote:
> Well, the *first* sort to the underlying Lucene engine is expensive since
> it builds up the terms to sort. I wonder if you're closing and opening the
> underlying searcher for every request? This is a definite limiter.
>
> Disclaimer: I mostly do Lucene, not SOLR (yet), so don't *even* ask
> me how to change this behavior <G>. But your comment about
> frequent updates to the index prompted this question....
>
> Best
> Erick
>
> On Feb 12, 2008 3:54 AM, James Brady <[EMAIL PROTECTED]> wrote:
>
>> Hi again,
>> More analysis showed that the extraordinarily long query times only
>> appeared when I specify a sort. A concrete example:
>>
>> For a querystring such as: ?indent=on&version=2.2&q=apache+user_id%
>> 3A39&start=0&rows=1&fl=*%2Cscore&qt=standard&wt=standard&explainOther=
>> The QTime is ~500ms.
>> For a querystring such as: ?indent=on&version=2.2&q=apache+user_id%
>> 3A39&start=0&rows=1&fl=*%
>> 2Cscore&qt=standard&wt=standard&explainOther=&sort=date_added%20asc
>> The QTime is ~75s
>>
>> I.e. I am using the StandardRequestHandler to search for a user
>> entered term ("apache" above) and filtering by a user_id field.
>>
>> This seems to be the case for every sort option except score asc and
>> score desc. Please tell me Solr doesn't sort all matching documents
>> before applying boolean filters?
>>
>> James
>>
>> Begin forwarded message:
>>
>>> From: James Brady <[EMAIL PROTECTED]>
>>> Date: 11 February 2008 23:38:16 GMT-08:00
>>> To: [email protected]
>>> Subject: Performance help for heavy indexing workload
>>>
>>> Hello,
>>> I'm looking for some configuration guidance to help improve
>>> performance of my application, which tends to do a lot more
>>> indexing than searching.
>>>
>>> At present, it needs to index around two documents / sec - a
>>> document being the stripped content of a webpage. However,
>>> performance was so poor that I've had to disable indexing of the
>>> webpage content as an emergency measure. In addition, some search
>>> queries take an inordinate length of time - regularly over 60 seconds.
>>>
>>> This is running on a medium sized EC2 instance (2 x 2GHz Opterons
>>> and 8GB RAM), and there's not too much else going on on the box. In
>>> total, there are about 1.5m documents in the index.
>>>
>>> I'm using a fairly standard configuration - the things I've tried
>>> changing so far have been parameters like maxMergeDocs, mergeFactor
>>> and the autoCommit options. I'm only using the
>>> StandardRequestHandler, no faceting. I have a scheduled task
>>> causing a database commit every 15 seconds.
>>>
>>> Obviously, every workload varies, but could anyone comment on
>>> whether this sort of hardware should, with proper configuration, be
>>> able to manage this sort of workload?
>>>
>>> I can't see signs of Solr being IO-bound, CPU-bound or memory-
>>> bound, although my scheduled commit operation, or perhaps GC, does
>>> spike up the CPU utilisation at intervals.
>>>
>>> Any help appreciated!
>>> James
>>
>>