Re: Slow queries for common terms

Erick Erickson Mon, 25 Mar 2013 08:04:03 -0700

take a look here:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html


looking at memory consumption can be a bit tricky to interpret with
MMapDirectory.

But you say "I see the CPU working very hard" which implies that your issue
is just scoring 90M documents. A way to test: try q=*:*&fq=field:book. My
bet is that that will be much faster, in which case scoring is your
choke-point and you'll need to spread that load across more servers, i.e.
shard.

When running the above, make sure of a couple of things:
1> you haven't run the fq query before (or you have filterCache turned
completely off).
2> you _have_ run a query or two that warms up your low-level caches.
Doesn't matter what, just as long as it doesn't have an fq clause.

Best
Erick



On Sat, Mar 23, 2013 at 3:10 AM, David Parks <davidpark...@yahoo.com> wrote:

> I see the CPU working very hard, and at the same time I see 2 MB/sec disk
> access for that 15 seconds. I am not running it this instant, but it seems
> to me that there was more CPU cycles available, so unless it's an issue of
> not being able to multithread it any  further I'd say it's more IO related.
>
> I'm going to set up solr cloud and shard across the 2 servers I have
> available for now. It's not an optimal setup we have while we're in a
> private beta period, but maybe it'll improve things (I've got 2 servers
> with
> 2x 4TB disks in raid-0 shared with the webservers).
>
> I'll work towards some improved IO performance and maybe more shards and
> see
> how things go. I'll also be able to up the RAM in just a couple of weeks.
>
> Are there any settings I should think of in terms of improving cache
> performance when I can give it say 10GB of RAM?
>
> Thanks, this has been tremendously helpful.
>
> David
>
>
> -----Original Message-----
> From: Tom Burton-West [mailto:tburt...@umich.edu]
> Sent: Saturday, March 23, 2013 1:38 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Slow queries for common terms
>
> Hi David and Jan,
>
> I wrote the blog post, and David, you are right, the problem we had was
> with
> phrase queries because our positions lists are so huge.  Boolean
> queries don't need to read the positions lists.   I think you need to
> determine whether you are CPU bound or I/O bound.    It is possible that
> you are I/O bound and reading the term frequency postings for 90 million
> docs is taking a long time.  In that case, More memory in the machine (but
> not dedicated to Solr) might help because Solr relies on OS disk caching
> for
> caching the postings lists.  You would still need to do some cache warming
> with your most common terms.
>
> On the other hand as Jan pointed out, you may be cpu bound because Solr
> doesn't have early termination and has to rank all 90 million docs in order
> to show the top 10 or 25.
>
> Did you try the OR search to see if your CPU is at 100%?
>
> Tom
>
> On Fri, Mar 22, 2013 at 10:14 AM, Jan Høydahl <jan....@cominvent.com>
> wrote:
>
> > Hi
> >
> > There might not be a final cure with more RAM if you are CPU bound.
> > Scoring 90M docs is some work. Can you check what's going on during
> > those
> > 15 seconds? Is your CPU at 100%? Try an (foo OR bar OR baz) search
> > which generates >100mill hits and see if that is slow too, even if you
> > don't use frequent words.
> >
> > I'm sure you can find other frequent terms in your corpus which
> > display similar behaviour, words which are even more frequent than
> > "book". Are you using "AND" as default operator? You will benefit from
> > limiting the number of results as much as possible.
> >
> > The real solution is to shard across N number of servers, until you
> > reach the desired performance for the desired indexing/querying load.
> >
> > --
> > Jan Høydahl, search solution architect Cominvent AS -
> > www.cominvent.com Solr Training - www.solrtraining.com
> >
> >
>
>

Re: Slow queries for common terms

Reply via email to