Re: Searching performance suffers tremendously during indexing

François Schiettecatte Sun, 01 May 2011 13:57:13 -0700

Couple of things. One you are not swaping which is a good thing. Second (and I 
am not sure what delay you selected for dstat, I would assume the default of 1 
second) there is some pretty heavy write activity like this:


26   1  71   2   0   0 |4096B 1424k|   0     0 | 719   415 | 197M  11G|1.00  
46.0 |4.0 9.0   0  13

where you are writing out 1.4GB for example, this is happening pretty regularly 
so I suspect you are swamping your drive.

You might also want to run atop and check the drive busy percentage, I would 
guess that you hitting high percentages.

François

On May 1, 2011, at 4:29 PM, Daniel Huss wrote:

> 
> Thanks for the tool recommendation! This is the dstat output during
> commit bombardment / concurrent search requests:
> 
> ----total-cpu-usage---- -dsk/total- ---paging-- ---system-- ----swap---
> --io/total- ---file-locks--
> usr sys idl wai hiq siq| read  writ|  in   out | int   csw | used  free|
> read  writ|pos lck rea wri
> 11   1  87   1   0   0|1221k  833k| 538B  828B| 784   920 | 197M  
> 11G|16.8  15.5 |4.0 9.0   0  13
> 60   0  40   0   0   0|   0     0 |   0     0 | 811   164 | 197M  
> 11G|   0     0 |4.0 9.0   0  13
> 25   0  75   0   0   0|   0     0 |   0     0 | 576    85 | 197M  
> 11G|   0     0 |4.0 9.0   0  13
> 25   0  75   0   0   0|   0     0 |   0     0 | 572    90 | 197M  
> 11G|   0     0 |4.0 9.0   0  13
> 25   0  74   0   0   0|   0     0 |   0     0 | 730   204 | 197M  
> 11G|   0     0 |4.0 9.0   0  13
> 26   1  71   2   0   0|4096B 1424k|   0     0 | 719   415 | 197M  
> 11G|1.00  46.0 |4.0 9.0   0  13
> 31   1  68   0   0   0|   0   136k|   0     0 | 877   741 | 197M  
> 11G|   0  6.00 |5.0 9.0   0  14
> 70   6  24   0   0   0|   0   516k|   0     0 |1705  1027 | 197M  
> 11G|   0  46.0 |5.0  11 1.0  15
> 72   3  25   0   0   0|4096B  384k|   0     0 |1392   910 | 197M  
> 11G|1.00  25.0 |5.0 9.0   0  14
> 60   2  25  12   0   0| 688k  108k|   0     0 |1162   509 | 197M  
> 11G|79.0  9.00 |4.0 9.0   0  13
> 94   1   5   0   0   0| 116k    0 |   0     0 |1271   654 | 197M  
> 11G|4.00     0 |4.0 9.0   0  13
> 57   0  43   0   0   0|   0     0 |   0     0 |1076   238 | 197M  
> 11G|   0     0 |4.0 9.0   0  13
> 26   0  73   0   0   0|   0    16k|   0     0 | 830   188 | 197M  
> 11G|   0  2.00 |4.0 9.0   0  13
> 29   1  70   0   0   0|   0     0 |   0     0 |1088   360 | 197M  
> 11G|   0     0 |4.0 9.0   0  13
> 29   1  70   0   0   1|   0   228k|   0     0 | 890   590 | 197M  
> 11G|   0  21.0 |4.0 9.0   0  13
> 81   6  13   0   0   0|4096B 1596k|   0     0 |1227   441 | 197M  
> 11G|1.00  52.0 |5.0 9.0   0  14
> 48   2  48   1   0   0| 172k    0 |   0     0 | 953   292 | 197M  
> 11G|21.0     0 |5.0 9.0   0  14
> 25   0  74   0   0   0|   0     0 |   0     0 | 808   222 | 197M  
> 11G|   0     0 |5.0 9.0   0  14
> 25   0  74   0   0   0|   0     0 |   0     0 | 607    90 | 197M  
> 11G|   0     0 |5.0 9.0   0  14
> 25   0  75   0   0   0|   0     0 |   0     0 | 603   106 | 197M  
> 11G|   0     0 |5.0 9.0   0  14
> 25   0  75   0   0   0|   0   144k|   0     0 | 625   104 | 197M  
> 11G|   0  7.00 |5.0 9.0   0  14
> 85   3   9   2   0   0| 248k   92k|   0     0 |1441   887 | 197M  
> 11G|33.0  7.00 |5.0 9.0   0  14
> 32   1  65   2   0   0| 404k  636k|   0     0 | 999   337 | 197M  
> 11G|38.0  96.0 |5.0 9.0   0  14
> 25   0  75   0   0   0|   0     0 |   0     0 | 609   117 | 197M  
> 11G|   0     0 |5.0 9.0   0  14
> 25   0  75   0   0   0|   0     0 |   0     0 | 604    77 | 197M  
> 11G|   0     0 |5.0 9.0   0  14
> 26   0  74   0   0   0|   0     0 |   0     0 | 781   183 | 197M  
> 11G|   0     0 |5.0 9.0   0  14
> 25   0  75   0   0   0|   0     0 |   0     0 | 620   110 | 197M  
> 11G|   0     0 |5.0 9.0   0  14
> 46   4  50   0   0   0|   0   116k|   0     0 | 901   398 | 197M  
> 11G|   0  12.0 |4.0 9.0   0  13
> 50   2  47   0   0   0|   0     0 |   0     0 |1031   737 | 197M  
> 11G|   0     0 |5.0 9.0   0  14
> 28   1  71   0   0   0|4096B  168k|   0     0 | 800   254 | 197M  
> 11G|1.00  9.00 |5.0 9.0   0  14
> 25   0  75   0   0   0|   0     0 |   0     0 | 571    84 | 197M  
> 11G|   0     0 |5.0 9.0   0  14
> 26   0  73   1   0   0|   0  1172k|   0     0 | 632   209 | 197M  
> 11G|   0  40.0 |5.0 9.0   0  14
> 
> 
> For the short term, we should be fine if we put those single-document
> jobs in a queue that gets flushed every 60 seconds.
> 
> Also, I should have mentioned that our index size is currently 27 GB
> containing 23.223.885 "documents" (only the PK is actually stored). For
> some reason I was assuming the commit time complexity to be constant,
> but that is probably not the case (?)
> 
> Sooner or later someone is going to profile the container that runs Solr
> and our document streamer. I'll post the results if we find anything of
> interest.
> 
> =================================
> 
> As a side note I've only just discovered that Solr 3.1 has been released
> (yaaaay!) We're currently using 1.4.1.
> 
>> If you are on linux, I would recommend two tools you can use to track what 
>> is going on on the machine, atop ( http://freshmeat.net/projects/atop/ ) and 
>> dstat ( http://freshmeat.net/projects/dstat/ ).
>> 
>> atop in particular has been very useful to me in tracking down performance 
>> issues in real time (when I am running a process) or at random intervals 
>> (when the machine slows down for no apparent reason.
>> 
>> From the little you have told us my hunch is that you are saturating a disk 
>> somewhere, either the index disk or swap (as pointed out by Mike)
>> 
>> Cheers
>> 
>> François
>> 
>> On May 1, 2011, at 9:54 AM, Michael McCandless wrote:
>> 
>>> Committing too frequently is very costly, since this calls fsync on
>>> numerous files under-the-hood, which strains the IO system and can cut
>>> into queries. If you really want to commit frequently, turning on compound
>>> file format could help things, since that's 1 file to fsync instead of N, 
>>> per
>>> segment.
>>> 
>>> Also, if you have a large merge running (turning on IW's infoStream
>>> will tell you), this can cause the OS to swap pages out, unless you
>>> set swappiness (if you're on Linux) to 0.
>>> 
>>> Finally, beware of having too-large a JVM max heap; you may accumulate
>>> long-lived, uncollected garbage, which the OS may happily swap out
>>> (since the pages are never touched), which then kills performance when
>>> GC finally runs.  I describe this here:
>>> http://blog.mikemccandless.com/2011/04/just-say-no-to-swapping.html
>>> It's good to leave some RAM for the OS to use as IO cache.
>>> 
>>> Ideally, merging should not evict pages from the OS's buffer cache,
>>> but unfortunately the low-level IO flags to control this (eg
>>> fadvise/madvise) are not available in Java (I wrote about that here:
>>> http://blog.mikemccandless.com/2010/06/lucene-and-fadvisemadvise.html).
>>> 
>>> However, we have a GCoC student this summer working on the problem
>>> (see https://issues.apache.org/jira/browse/LUCENE-2795), so after this
>>> is done we'll have a NativeUnixDirectory impl that hopefully prevents
>>> buffer cache eviction due to merging without you having to tweak
>>> swappiness settings.
>>> 
>>> Mike
>>> 
>>> http://blog.mikemccandless.com
>>> 
>>> On Sat, Apr 30, 2011 at 9:23 PM, Craig Stires <craig.sti...@gmail.com> 
>>> wrote:
>>>> Daniel,
>>>> 
>>>> I've been able to post documents to Solr without degrading the performance
>>>> of search.  But, I did have to make some changes to the solrconfig.xml
>>>> (ramBufferSize, mergeFactor, autoCommit, etc).
>>>> 
>>>> What I found to be helpful was having a look at what was the causing the OS
>>>> to grind.  If your system is swapping too much to disk, you can check if
>>>> bumping up the ram (-Xms512m -Xmx1024m) alleviates it.  Even if this isn't
>>>> the fix, you can at least isolate if it's a memory issue, or if your issue
>>>> is related to a disk I/O issue (e.g. running optimization on every commit).
>>>> 
>>>> 
>>>> Also, is worth having a look in your logs to see if the server is having
>>>> complaints about memory or issues with your schema, or some other 
>>>> unexpected
>>>> issue.
>>>> 
>>>> A resource that has been helpful for me
>>>> http://wiki.apache.org/solr/SolrPerformanceFactors
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> -----Original Message-----
>>>> From: Daniel Huss [mailto:hussdl1985-solrus...@yahoo.de]
>>>> Sent: Sunday, 1 May 2011 5:35 AM
>>>> To: solr-user@lucene.apache.org
>>>> Subject: Searching performance suffers tremendously during indexing
>>>> 
>>>> Hi everyone,
>>>> 
>>>> our Solr-based search is unresponsive while documents are being indexed.
>>>> The documents to index (results of a DB query) are sent to Solr by a
>>>> daemon in batches of varying size. The number of documents per batch may
>>>> vary between one and several hundreds of thousands.
>>>> 
>>>> Before investigating any further, I would like to ask if this can be
>>>> considered an issue at all. I was expecting Solr to handle concurrent
>>>> indexing/searching quite well, in fact this was one of the main reasons
>>>> for chosing Solr over the searching capabilities of our RDMS.
>>>> 
>>>> Is searching performance *supposed* to drop while documents are being
>>>> indexed?
>>>> 
>>>> 
>

Re: Searching performance suffers tremendously during indexing

Reply via email to