Hi,

A bit hard to tell without seeing all system and Solr metrics, but the
first thing I'd look at is disk IO.  Yes, you may warm up the index, but
you still have some new data on disk after replication, so if a user query
hits data in newly copied blocks, it will *have to* be read from disk.  See
SPM for Solr - it monitors this sort of stuff.

Otis
--
SOLR Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html




On Sat, Dec 1, 2012 at 5:05 PM, Ryan Gardner <ryeb...@gmail.com> wrote:

> I'm working on performance tuning / testing a solr 4 cluster - the slaves
> are handling all the queries and they are currently replicating from the
> master every minute.
>
> Right now for my load test, I'm playing back random queries form a set of
> about 5M queries harvested from our existing production servers. Whenever
> the replication ends and the new searcher gets used there is a jump in
> QTimes.
>
> The QTimes jump briefly but ONLY after replication. I tried adding in some
> warming queries to create facets etc - and I analyzed the queries and they
> all seem relatively similar (use a large number of facets etc) - but even
> when I have converted those exact queries over to the format that is in the
> logs, it doesn't seem to eliminate the "post-repl bump" of qtimes jumping
> from single digits to 500 - 800ms.
>
> Here's an example of what i'm talking about.
>
> -- replication ends (message about deleting temporary index in the logs)
> date  |  qtime
> 21:50:45.159 | 0 |
> 21:50:45.170 | 1 |
> 21:50:45.174 | 0 |
> 21:50:45.182 | 2 |
> 21:50:45.185 | 78 |
> 21:50:45.194 | 1 |
> 21:50:45.201 | 69 |
> 21:50:45.206 | 0 |
> 21:50:45.211 | 0 |
> 21:50:45.219 | 0 |
> 21:50:45.286 | 1 |
> 21:50:45.288 | 0 |
> 21:50:45.301 | 30 |
> 21:50:45.317 | 15 |
> 21:50:45.327 | 2 |
> 21:50:45.327 | 1 |
> 21:50:45.334 | 0 |
> 21:50:45.337 | 1 |
> 21:50:45.345 | 1 |
> 21:50:45.347 | 2 |
> 21:50:45.392 | 19 |
> 21:50:45.415 | 47 |
> 21:50:45.428 | 1 |
> 21:50:45.438 | 2 |
> 21:50:45.453 | 1 |
> 21:50:45.468 | 3 |
> 21:50:45.507 | 4 |
> 21:50:45.551 | 1 |
> 21:50:45.617 | 92 |
> 21:50:45.617 | 251 |
> 21:50:45.619 | 457 |
> 21:50:45.632 | 1 |
> 21:50:45.731 | 500 |
> 21:50:45.731 | 437 |
> 21:50:45.731 | 526 |
> 21:50:45.731 | 514 |
> 21:50:45.731 | 354 |
> 21:50:45.731 | 531 |
> 21:50:45.731 | 525 |
> 21:50:45.731 | 525 |
> 21:50:45.732 | 502 |
> 21:50:45.732 | 452 |
> 21:50:45.732 | 278 |
> 21:50:45.732 | 527 |
> 21:50:45.732 | 270 |
> 21:50:45.732 | 576 |
> 21:50:45.733 | 221 |
> 21:50:45.733 | 225 |
> 21:50:45.734 | 265 |
> 21:50:45.735 | 370 |
> 21:50:45.737 | 551 |
> 21:50:45.737 | 517 |
> 21:50:45.738 | 440 |
> 21:50:45.738 | 477 |
> 21:50:45.738 | 43 |
> 21:50:45.738 | 299 |
> 21:50:45.739 | 541 |
> 21:50:45.825 | 1 |
> 21:50:45.838 | 5 |
> 21:50:45.848 | 0 |
> 21:50:45.852 | 0 |
> 21:50:45.859 | 19 |
> 21:50:45.875 | 6 |
> 21:50:45.876 | 7 |
> 21:50:45.881 | 0 |
> 21:50:45.883 | 14 |
> 21:50:45.886 | 1 |
> 21:50:45.890 | 4 |
> 21:50:45.891 | 3 |
> 21:50:45.894 | 1 |
> 21:50:45.902 | 1 |
> 21:50:45.906 | 4 |
> 21:50:45.908 | 3 |
> 21:50:45.918 | 18 |
> 21:50:45.921 | 10 |
>
> The DocumentCache in our case is not very useful because of our 1 minute
> replication pattern. I have it sized to 1024 elements now. When I've tried
> to increase the size, it caused our GC pause times to skyrocket. (Currently
> it's tuned so it has a 250ms GC pause roughly every 16 seconds - and I've
> verified that the above QTime bump is not due to GC activity)
>
> Is there something I can do to help with these? (short of increasing the
> replication time to a longer interval to mitigate the impact of these
> bumps)?
>
> I guess the real question I have is "Why do queries get faster after a
> second or so after replication? How can I try to get it to do that as part
> of the newSearcher warming??" - Like I said, I've copied some of those slow
> queries and put them into the newSearcher warming section to see if "well,
> maybe running through a few dozen of these searches that's what makes it
> get faster" - but that hasn't helped.
>
> Our index file is stored on disk but the OS basically has it all cached in
> RAM (I tested moving it to tmpfs but saw no improvement in speed - so I
> went back to putting it on disk) - and the CPU is not anywhere near taxed
> (it's got 24 cores on this machine)
>
> So far the performance of SOLR has been stellar - and once we finish tuning
> this we'll write up how we've tuned it / how we're using it to share out
> more widely with anyone who cares - but the one perplexing thing is this
> bump in bad times after replication.
>
> Any thoughts would be appreciated.
>
> Ryan
>

Reply via email to