Hi David and Jan,

I wrote the blog post, and David, you are right, the problem we had was
with phrase queries because our positions lists are so huge.  Boolean
queries don't need to read the positions lists.   I think you need to
determine whether you are CPU bound or I/O bound.    It is possible that
you are I/O bound and reading the term frequency postings for 90 million
docs is taking a long time.  In that case, More memory in the machine (but
not dedicated to Solr) might help because Solr relies on OS disk caching
for caching the postings lists.  You would still need to do some cache
warming with your most common terms.

On the other hand as Jan pointed out, you may be cpu bound because Solr
doesn't have early termination and has to rank all 90 million docs in order
to show the top 10 or 25.

Did you try the OR search to see if your CPU is at 100%?

Tom

On Fri, Mar 22, 2013 at 10:14 AM, Jan Høydahl <jan....@cominvent.com> wrote:

> Hi
>
> There might not be a final cure with more RAM if you are CPU bound.
> Scoring 90M docs is some work. Can you check what's going on during those
> 15 seconds? Is your CPU at 100%? Try an (foo OR bar OR baz) search which
> generates >100mill hits and see if that is slow too, even if you don't use
> frequent words.
>
> I'm sure you can find other frequent terms in your corpus which display
> similar behaviour, words which are even more frequent than "book". Are you
> using "AND" as default operator? You will benefit from limiting the number
> of results as much as possible.
>
> The real solution is to shard across N number of servers, until you reach
> the desired performance for the desired indexing/querying load.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
>

Reply via email to