Couple of things. One you are not swaping which is a good thing. Second (and I am not sure what delay you selected for dstat, I would assume the default of 1 second) there is some pretty heavy write activity like this:
26 1 71 2 0 0 |4096B 1424k| 0 0 | 719 415 | 197M 11G|1.00 46.0 |4.0 9.0 0 13 where you are writing out 1.4GB for example, this is happening pretty regularly so I suspect you are swamping your drive. You might also want to run atop and check the drive busy percentage, I would guess that you hitting high percentages. François On May 1, 2011, at 4:29 PM, Daniel Huss wrote: > > Thanks for the tool recommendation! This is the dstat output during > commit bombardment / concurrent search requests: > > ----total-cpu-usage---- -dsk/total- ---paging-- ---system-- ----swap--- > --io/total- ---file-locks-- > usr sys idl wai hiq siq| read writ| in out | int csw | used free| > read writ|pos lck rea wri > 11 1 87 1 0 0|1221k 833k| 538B 828B| 784 920 | 197M > 11G|16.8 15.5 |4.0 9.0 0 13 > 60 0 40 0 0 0| 0 0 | 0 0 | 811 164 | 197M > 11G| 0 0 |4.0 9.0 0 13 > 25 0 75 0 0 0| 0 0 | 0 0 | 576 85 | 197M > 11G| 0 0 |4.0 9.0 0 13 > 25 0 75 0 0 0| 0 0 | 0 0 | 572 90 | 197M > 11G| 0 0 |4.0 9.0 0 13 > 25 0 74 0 0 0| 0 0 | 0 0 | 730 204 | 197M > 11G| 0 0 |4.0 9.0 0 13 > 26 1 71 2 0 0|4096B 1424k| 0 0 | 719 415 | 197M > 11G|1.00 46.0 |4.0 9.0 0 13 > 31 1 68 0 0 0| 0 136k| 0 0 | 877 741 | 197M > 11G| 0 6.00 |5.0 9.0 0 14 > 70 6 24 0 0 0| 0 516k| 0 0 |1705 1027 | 197M > 11G| 0 46.0 |5.0 11 1.0 15 > 72 3 25 0 0 0|4096B 384k| 0 0 |1392 910 | 197M > 11G|1.00 25.0 |5.0 9.0 0 14 > 60 2 25 12 0 0| 688k 108k| 0 0 |1162 509 | 197M > 11G|79.0 9.00 |4.0 9.0 0 13 > 94 1 5 0 0 0| 116k 0 | 0 0 |1271 654 | 197M > 11G|4.00 0 |4.0 9.0 0 13 > 57 0 43 0 0 0| 0 0 | 0 0 |1076 238 | 197M > 11G| 0 0 |4.0 9.0 0 13 > 26 0 73 0 0 0| 0 16k| 0 0 | 830 188 | 197M > 11G| 0 2.00 |4.0 9.0 0 13 > 29 1 70 0 0 0| 0 0 | 0 0 |1088 360 | 197M > 11G| 0 0 |4.0 9.0 0 13 > 29 1 70 0 0 1| 0 228k| 0 0 | 890 590 | 197M > 11G| 0 21.0 |4.0 9.0 0 13 > 81 6 13 0 0 0|4096B 1596k| 0 0 |1227 441 | 197M > 11G|1.00 52.0 |5.0 9.0 0 14 > 48 2 48 1 0 0| 172k 0 | 0 0 | 953 292 | 197M > 11G|21.0 0 |5.0 9.0 0 14 > 25 0 74 0 0 0| 0 0 | 0 0 | 808 222 | 197M > 11G| 0 0 |5.0 9.0 0 14 > 25 0 74 0 0 0| 0 0 | 0 0 | 607 90 | 197M > 11G| 0 0 |5.0 9.0 0 14 > 25 0 75 0 0 0| 0 0 | 0 0 | 603 106 | 197M > 11G| 0 0 |5.0 9.0 0 14 > 25 0 75 0 0 0| 0 144k| 0 0 | 625 104 | 197M > 11G| 0 7.00 |5.0 9.0 0 14 > 85 3 9 2 0 0| 248k 92k| 0 0 |1441 887 | 197M > 11G|33.0 7.00 |5.0 9.0 0 14 > 32 1 65 2 0 0| 404k 636k| 0 0 | 999 337 | 197M > 11G|38.0 96.0 |5.0 9.0 0 14 > 25 0 75 0 0 0| 0 0 | 0 0 | 609 117 | 197M > 11G| 0 0 |5.0 9.0 0 14 > 25 0 75 0 0 0| 0 0 | 0 0 | 604 77 | 197M > 11G| 0 0 |5.0 9.0 0 14 > 26 0 74 0 0 0| 0 0 | 0 0 | 781 183 | 197M > 11G| 0 0 |5.0 9.0 0 14 > 25 0 75 0 0 0| 0 0 | 0 0 | 620 110 | 197M > 11G| 0 0 |5.0 9.0 0 14 > 46 4 50 0 0 0| 0 116k| 0 0 | 901 398 | 197M > 11G| 0 12.0 |4.0 9.0 0 13 > 50 2 47 0 0 0| 0 0 | 0 0 |1031 737 | 197M > 11G| 0 0 |5.0 9.0 0 14 > 28 1 71 0 0 0|4096B 168k| 0 0 | 800 254 | 197M > 11G|1.00 9.00 |5.0 9.0 0 14 > 25 0 75 0 0 0| 0 0 | 0 0 | 571 84 | 197M > 11G| 0 0 |5.0 9.0 0 14 > 26 0 73 1 0 0| 0 1172k| 0 0 | 632 209 | 197M > 11G| 0 40.0 |5.0 9.0 0 14 > > > For the short term, we should be fine if we put those single-document > jobs in a queue that gets flushed every 60 seconds. > > Also, I should have mentioned that our index size is currently 27 GB > containing 23.223.885 "documents" (only the PK is actually stored). For > some reason I was assuming the commit time complexity to be constant, > but that is probably not the case (?) > > Sooner or later someone is going to profile the container that runs Solr > and our document streamer. I'll post the results if we find anything of > interest. > > ================================= > > As a side note I've only just discovered that Solr 3.1 has been released > (yaaaay!) We're currently using 1.4.1. > >> If you are on linux, I would recommend two tools you can use to track what >> is going on on the machine, atop ( http://freshmeat.net/projects/atop/ ) and >> dstat ( http://freshmeat.net/projects/dstat/ ). >> >> atop in particular has been very useful to me in tracking down performance >> issues in real time (when I am running a process) or at random intervals >> (when the machine slows down for no apparent reason. >> >> From the little you have told us my hunch is that you are saturating a disk >> somewhere, either the index disk or swap (as pointed out by Mike) >> >> Cheers >> >> François >> >> On May 1, 2011, at 9:54 AM, Michael McCandless wrote: >> >>> Committing too frequently is very costly, since this calls fsync on >>> numerous files under-the-hood, which strains the IO system and can cut >>> into queries. If you really want to commit frequently, turning on compound >>> file format could help things, since that's 1 file to fsync instead of N, >>> per >>> segment. >>> >>> Also, if you have a large merge running (turning on IW's infoStream >>> will tell you), this can cause the OS to swap pages out, unless you >>> set swappiness (if you're on Linux) to 0. >>> >>> Finally, beware of having too-large a JVM max heap; you may accumulate >>> long-lived, uncollected garbage, which the OS may happily swap out >>> (since the pages are never touched), which then kills performance when >>> GC finally runs. I describe this here: >>> http://blog.mikemccandless.com/2011/04/just-say-no-to-swapping.html >>> It's good to leave some RAM for the OS to use as IO cache. >>> >>> Ideally, merging should not evict pages from the OS's buffer cache, >>> but unfortunately the low-level IO flags to control this (eg >>> fadvise/madvise) are not available in Java (I wrote about that here: >>> http://blog.mikemccandless.com/2010/06/lucene-and-fadvisemadvise.html). >>> >>> However, we have a GCoC student this summer working on the problem >>> (see https://issues.apache.org/jira/browse/LUCENE-2795), so after this >>> is done we'll have a NativeUnixDirectory impl that hopefully prevents >>> buffer cache eviction due to merging without you having to tweak >>> swappiness settings. >>> >>> Mike >>> >>> http://blog.mikemccandless.com >>> >>> On Sat, Apr 30, 2011 at 9:23 PM, Craig Stires <craig.sti...@gmail.com> >>> wrote: >>>> Daniel, >>>> >>>> I've been able to post documents to Solr without degrading the performance >>>> of search. But, I did have to make some changes to the solrconfig.xml >>>> (ramBufferSize, mergeFactor, autoCommit, etc). >>>> >>>> What I found to be helpful was having a look at what was the causing the OS >>>> to grind. If your system is swapping too much to disk, you can check if >>>> bumping up the ram (-Xms512m -Xmx1024m) alleviates it. Even if this isn't >>>> the fix, you can at least isolate if it's a memory issue, or if your issue >>>> is related to a disk I/O issue (e.g. running optimization on every commit). >>>> >>>> >>>> Also, is worth having a look in your logs to see if the server is having >>>> complaints about memory or issues with your schema, or some other >>>> unexpected >>>> issue. >>>> >>>> A resource that has been helpful for me >>>> http://wiki.apache.org/solr/SolrPerformanceFactors >>>> >>>> >>>> >>>> >>>> >>>> >>>> -----Original Message----- >>>> From: Daniel Huss [mailto:hussdl1985-solrus...@yahoo.de] >>>> Sent: Sunday, 1 May 2011 5:35 AM >>>> To: solr-user@lucene.apache.org >>>> Subject: Searching performance suffers tremendously during indexing >>>> >>>> Hi everyone, >>>> >>>> our Solr-based search is unresponsive while documents are being indexed. >>>> The documents to index (results of a DB query) are sent to Solr by a >>>> daemon in batches of varying size. The number of documents per batch may >>>> vary between one and several hundreds of thousands. >>>> >>>> Before investigating any further, I would like to ask if this can be >>>> considered an issue at all. I was expecting Solr to handle concurrent >>>> indexing/searching quite well, in fact this was one of the main reasons >>>> for chosing Solr over the searching capabilities of our RDMS. >>>> >>>> Is searching performance *supposed* to drop while documents are being >>>> indexed? >>>> >>>> >