http://lucene.apache.org/java/3_0_3/api/contrib-misc/org/apache/lucene/index/BalancedSegmentMergePolicy.html
Look in solrconfig.xml for where MergePolicy is configured. On Sun, May 1, 2011 at 6:31 PM, Lance Norskog <goks...@gmail.com> wrote: > Yes, indexing generally slows down querying. Most sites do indexing in > one Solr and queries from another. The indexing system does 'merging', > which involves copying data around in files. > > With 10g allocated to a 1g load, the JVM is doing a lot more garbage > collection that it would with a 1.5g allocation. Suggest dropping the > memory allocation. Also, the operating system is very good at managing > the disk cache, and this actually works better than Solr caching the > index data. > > The main problem is index merging. If you are on 3.1, there is an > alternate merging "policy" called the BalancedSegmentMergePolicy. This > is fine-tuned to mix indexing and querying on one Solr; it was written > at LinkedIn because their web-facing searchers do indexing & searching > simultaneously. > > 2011/5/1 François Schiettecatte <fschietteca...@gmail.com>: >> Couple of things. One you are not swaping which is a good thing. Second (and >> I am not sure what delay you selected for dstat, I would assume the default >> of 1 second) there is some pretty heavy write activity like this: >> >> 26 1 71 2 0 0 |4096B 1424k| 0 0 | 719 415 | 197M 11G|1.00 >> 46.0 |4.0 9.0 0 13 >> >> where you are writing out 1.4GB for example, this is happening pretty >> regularly so I suspect you are swamping your drive. >> >> You might also want to run atop and check the drive busy percentage, I would >> guess that you hitting high percentages. >> >> François >> >> On May 1, 2011, at 4:29 PM, Daniel Huss wrote: >> >>> >>> Thanks for the tool recommendation! This is the dstat output during >>> commit bombardment / concurrent search requests: >>> >>> ----total-cpu-usage---- -dsk/total- ---paging-- ---system-- ----swap--- >>> --io/total- ---file-locks-- >>> usr sys idl wai hiq siq| read writ| in out | int csw | used free| >>> read writ|pos lck rea wri >>> 11 1 87 1 0 0|1221k 833k| 538B 828B| 784 920 | 197M >>> 11G|16.8 15.5 |4.0 9.0 0 13 >>> 60 0 40 0 0 0| 0 0 | 0 0 | 811 164 | 197M >>> 11G| 0 0 |4.0 9.0 0 13 >>> 25 0 75 0 0 0| 0 0 | 0 0 | 576 85 | 197M >>> 11G| 0 0 |4.0 9.0 0 13 >>> 25 0 75 0 0 0| 0 0 | 0 0 | 572 90 | 197M >>> 11G| 0 0 |4.0 9.0 0 13 >>> 25 0 74 0 0 0| 0 0 | 0 0 | 730 204 | 197M >>> 11G| 0 0 |4.0 9.0 0 13 >>> 26 1 71 2 0 0|4096B 1424k| 0 0 | 719 415 | 197M >>> 11G|1.00 46.0 |4.0 9.0 0 13 >>> 31 1 68 0 0 0| 0 136k| 0 0 | 877 741 | 197M >>> 11G| 0 6.00 |5.0 9.0 0 14 >>> 70 6 24 0 0 0| 0 516k| 0 0 |1705 1027 | 197M >>> 11G| 0 46.0 |5.0 11 1.0 15 >>> 72 3 25 0 0 0|4096B 384k| 0 0 |1392 910 | 197M >>> 11G|1.00 25.0 |5.0 9.0 0 14 >>> 60 2 25 12 0 0| 688k 108k| 0 0 |1162 509 | 197M >>> 11G|79.0 9.00 |4.0 9.0 0 13 >>> 94 1 5 0 0 0| 116k 0 | 0 0 |1271 654 | 197M >>> 11G|4.00 0 |4.0 9.0 0 13 >>> 57 0 43 0 0 0| 0 0 | 0 0 |1076 238 | 197M >>> 11G| 0 0 |4.0 9.0 0 13 >>> 26 0 73 0 0 0| 0 16k| 0 0 | 830 188 | 197M >>> 11G| 0 2.00 |4.0 9.0 0 13 >>> 29 1 70 0 0 0| 0 0 | 0 0 |1088 360 | 197M >>> 11G| 0 0 |4.0 9.0 0 13 >>> 29 1 70 0 0 1| 0 228k| 0 0 | 890 590 | 197M >>> 11G| 0 21.0 |4.0 9.0 0 13 >>> 81 6 13 0 0 0|4096B 1596k| 0 0 |1227 441 | 197M >>> 11G|1.00 52.0 |5.0 9.0 0 14 >>> 48 2 48 1 0 0| 172k 0 | 0 0 | 953 292 | 197M >>> 11G|21.0 0 |5.0 9.0 0 14 >>> 25 0 74 0 0 0| 0 0 | 0 0 | 808 222 | 197M >>> 11G| 0 0 |5.0 9.0 0 14 >>> 25 0 74 0 0 0| 0 0 | 0 0 | 607 90 | 197M >>> 11G| 0 0 |5.0 9.0 0 14 >>> 25 0 75 0 0 0| 0 0 | 0 0 | 603 106 | 197M >>> 11G| 0 0 |5.0 9.0 0 14 >>> 25 0 75 0 0 0| 0 144k| 0 0 | 625 104 | 197M >>> 11G| 0 7.00 |5.0 9.0 0 14 >>> 85 3 9 2 0 0| 248k 92k| 0 0 |1441 887 | 197M >>> 11G|33.0 7.00 |5.0 9.0 0 14 >>> 32 1 65 2 0 0| 404k 636k| 0 0 | 999 337 | 197M >>> 11G|38.0 96.0 |5.0 9.0 0 14 >>> 25 0 75 0 0 0| 0 0 | 0 0 | 609 117 | 197M >>> 11G| 0 0 |5.0 9.0 0 14 >>> 25 0 75 0 0 0| 0 0 | 0 0 | 604 77 | 197M >>> 11G| 0 0 |5.0 9.0 0 14 >>> 26 0 74 0 0 0| 0 0 | 0 0 | 781 183 | 197M >>> 11G| 0 0 |5.0 9.0 0 14 >>> 25 0 75 0 0 0| 0 0 | 0 0 | 620 110 | 197M >>> 11G| 0 0 |5.0 9.0 0 14 >>> 46 4 50 0 0 0| 0 116k| 0 0 | 901 398 | 197M >>> 11G| 0 12.0 |4.0 9.0 0 13 >>> 50 2 47 0 0 0| 0 0 | 0 0 |1031 737 | 197M >>> 11G| 0 0 |5.0 9.0 0 14 >>> 28 1 71 0 0 0|4096B 168k| 0 0 | 800 254 | 197M >>> 11G|1.00 9.00 |5.0 9.0 0 14 >>> 25 0 75 0 0 0| 0 0 | 0 0 | 571 84 | 197M >>> 11G| 0 0 |5.0 9.0 0 14 >>> 26 0 73 1 0 0| 0 1172k| 0 0 | 632 209 | 197M >>> 11G| 0 40.0 |5.0 9.0 0 14 >>> >>> >>> For the short term, we should be fine if we put those single-document >>> jobs in a queue that gets flushed every 60 seconds. >>> >>> Also, I should have mentioned that our index size is currently 27 GB >>> containing 23.223.885 "documents" (only the PK is actually stored). For >>> some reason I was assuming the commit time complexity to be constant, >>> but that is probably not the case (?) >>> >>> Sooner or later someone is going to profile the container that runs Solr >>> and our document streamer. I'll post the results if we find anything of >>> interest. >>> >>> ================================= >>> >>> As a side note I've only just discovered that Solr 3.1 has been released >>> (yaaaay!) We're currently using 1.4.1. >>> >>>> If you are on linux, I would recommend two tools you can use to track what >>>> is going on on the machine, atop ( http://freshmeat.net/projects/atop/ ) >>>> and dstat ( http://freshmeat.net/projects/dstat/ ). >>>> >>>> atop in particular has been very useful to me in tracking down performance >>>> issues in real time (when I am running a process) or at random intervals >>>> (when the machine slows down for no apparent reason. >>>> >>>> From the little you have told us my hunch is that you are saturating a >>>> disk somewhere, either the index disk or swap (as pointed out by Mike) >>>> >>>> Cheers >>>> >>>> François >>>> >>>> On May 1, 2011, at 9:54 AM, Michael McCandless wrote: >>>> >>>>> Committing too frequently is very costly, since this calls fsync on >>>>> numerous files under-the-hood, which strains the IO system and can cut >>>>> into queries. If you really want to commit frequently, turning on compound >>>>> file format could help things, since that's 1 file to fsync instead of N, >>>>> per >>>>> segment. >>>>> >>>>> Also, if you have a large merge running (turning on IW's infoStream >>>>> will tell you), this can cause the OS to swap pages out, unless you >>>>> set swappiness (if you're on Linux) to 0. >>>>> >>>>> Finally, beware of having too-large a JVM max heap; you may accumulate >>>>> long-lived, uncollected garbage, which the OS may happily swap out >>>>> (since the pages are never touched), which then kills performance when >>>>> GC finally runs. I describe this here: >>>>> http://blog.mikemccandless.com/2011/04/just-say-no-to-swapping.html >>>>> It's good to leave some RAM for the OS to use as IO cache. >>>>> >>>>> Ideally, merging should not evict pages from the OS's buffer cache, >>>>> but unfortunately the low-level IO flags to control this (eg >>>>> fadvise/madvise) are not available in Java (I wrote about that here: >>>>> http://blog.mikemccandless.com/2010/06/lucene-and-fadvisemadvise.html). >>>>> >>>>> However, we have a GCoC student this summer working on the problem >>>>> (see https://issues.apache.org/jira/browse/LUCENE-2795), so after this >>>>> is done we'll have a NativeUnixDirectory impl that hopefully prevents >>>>> buffer cache eviction due to merging without you having to tweak >>>>> swappiness settings. >>>>> >>>>> Mike >>>>> >>>>> http://blog.mikemccandless.com >>>>> >>>>> On Sat, Apr 30, 2011 at 9:23 PM, Craig Stires <craig.sti...@gmail.com> >>>>> wrote: >>>>>> Daniel, >>>>>> >>>>>> I've been able to post documents to Solr without degrading the >>>>>> performance >>>>>> of search. But, I did have to make some changes to the solrconfig.xml >>>>>> (ramBufferSize, mergeFactor, autoCommit, etc). >>>>>> >>>>>> What I found to be helpful was having a look at what was the causing the >>>>>> OS >>>>>> to grind. If your system is swapping too much to disk, you can check if >>>>>> bumping up the ram (-Xms512m -Xmx1024m) alleviates it. Even if this >>>>>> isn't >>>>>> the fix, you can at least isolate if it's a memory issue, or if your >>>>>> issue >>>>>> is related to a disk I/O issue (e.g. running optimization on every >>>>>> commit). >>>>>> >>>>>> >>>>>> Also, is worth having a look in your logs to see if the server is having >>>>>> complaints about memory or issues with your schema, or some other >>>>>> unexpected >>>>>> issue. >>>>>> >>>>>> A resource that has been helpful for me >>>>>> http://wiki.apache.org/solr/SolrPerformanceFactors >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -----Original Message----- >>>>>> From: Daniel Huss [mailto:hussdl1985-solrus...@yahoo.de] >>>>>> Sent: Sunday, 1 May 2011 5:35 AM >>>>>> To: solr-user@lucene.apache.org >>>>>> Subject: Searching performance suffers tremendously during indexing >>>>>> >>>>>> Hi everyone, >>>>>> >>>>>> our Solr-based search is unresponsive while documents are being indexed. >>>>>> The documents to index (results of a DB query) are sent to Solr by a >>>>>> daemon in batches of varying size. The number of documents per batch may >>>>>> vary between one and several hundreds of thousands. >>>>>> >>>>>> Before investigating any further, I would like to ask if this can be >>>>>> considered an issue at all. I was expecting Solr to handle concurrent >>>>>> indexing/searching quite well, in fact this was one of the main reasons >>>>>> for chosing Solr over the searching capabilities of our RDMS. >>>>>> >>>>>> Is searching performance *supposed* to drop while documents are being >>>>>> indexed? >>>>>> >>>>>> >>> >> >> > > > > -- > Lance Norskog > goks...@gmail.com > -- Lance Norskog goks...@gmail.com