Yes, indexing generally slows down querying. Most sites do indexing in one Solr and queries from another. The indexing system does 'merging', which involves copying data around in files.
With 10g allocated to a 1g load, the JVM is doing a lot more garbage collection that it would with a 1.5g allocation. Suggest dropping the memory allocation. Also, the operating system is very good at managing the disk cache, and this actually works better than Solr caching the index data. The main problem is index merging. If you are on 3.1, there is an alternate merging "policy" called the BalancedSegmentMergePolicy. This is fine-tuned to mix indexing and querying on one Solr; it was written at LinkedIn because their web-facing searchers do indexing & searching simultaneously. 2011/5/1 François Schiettecatte <fschietteca...@gmail.com>: > Couple of things. One you are not swaping which is a good thing. Second (and > I am not sure what delay you selected for dstat, I would assume the default > of 1 second) there is some pretty heavy write activity like this: > > 26 1 71 2 0 0 |4096B 1424k| 0 0 | 719 415 | 197M 11G|1.00 > 46.0 |4.0 9.0 0 13 > > where you are writing out 1.4GB for example, this is happening pretty > regularly so I suspect you are swamping your drive. > > You might also want to run atop and check the drive busy percentage, I would > guess that you hitting high percentages. > > François > > On May 1, 2011, at 4:29 PM, Daniel Huss wrote: > >> >> Thanks for the tool recommendation! This is the dstat output during >> commit bombardment / concurrent search requests: >> >> ----total-cpu-usage---- -dsk/total- ---paging-- ---system-- ----swap--- >> --io/total- ---file-locks-- >> usr sys idl wai hiq siq| read writ| in out | int csw | used free| >> read writ|pos lck rea wri >> 11 1 87 1 0 0|1221k 833k| 538B 828B| 784 920 | 197M >> 11G|16.8 15.5 |4.0 9.0 0 13 >> 60 0 40 0 0 0| 0 0 | 0 0 | 811 164 | 197M >> 11G| 0 0 |4.0 9.0 0 13 >> 25 0 75 0 0 0| 0 0 | 0 0 | 576 85 | 197M >> 11G| 0 0 |4.0 9.0 0 13 >> 25 0 75 0 0 0| 0 0 | 0 0 | 572 90 | 197M >> 11G| 0 0 |4.0 9.0 0 13 >> 25 0 74 0 0 0| 0 0 | 0 0 | 730 204 | 197M >> 11G| 0 0 |4.0 9.0 0 13 >> 26 1 71 2 0 0|4096B 1424k| 0 0 | 719 415 | 197M >> 11G|1.00 46.0 |4.0 9.0 0 13 >> 31 1 68 0 0 0| 0 136k| 0 0 | 877 741 | 197M >> 11G| 0 6.00 |5.0 9.0 0 14 >> 70 6 24 0 0 0| 0 516k| 0 0 |1705 1027 | 197M >> 11G| 0 46.0 |5.0 11 1.0 15 >> 72 3 25 0 0 0|4096B 384k| 0 0 |1392 910 | 197M >> 11G|1.00 25.0 |5.0 9.0 0 14 >> 60 2 25 12 0 0| 688k 108k| 0 0 |1162 509 | 197M >> 11G|79.0 9.00 |4.0 9.0 0 13 >> 94 1 5 0 0 0| 116k 0 | 0 0 |1271 654 | 197M >> 11G|4.00 0 |4.0 9.0 0 13 >> 57 0 43 0 0 0| 0 0 | 0 0 |1076 238 | 197M >> 11G| 0 0 |4.0 9.0 0 13 >> 26 0 73 0 0 0| 0 16k| 0 0 | 830 188 | 197M >> 11G| 0 2.00 |4.0 9.0 0 13 >> 29 1 70 0 0 0| 0 0 | 0 0 |1088 360 | 197M >> 11G| 0 0 |4.0 9.0 0 13 >> 29 1 70 0 0 1| 0 228k| 0 0 | 890 590 | 197M >> 11G| 0 21.0 |4.0 9.0 0 13 >> 81 6 13 0 0 0|4096B 1596k| 0 0 |1227 441 | 197M >> 11G|1.00 52.0 |5.0 9.0 0 14 >> 48 2 48 1 0 0| 172k 0 | 0 0 | 953 292 | 197M >> 11G|21.0 0 |5.0 9.0 0 14 >> 25 0 74 0 0 0| 0 0 | 0 0 | 808 222 | 197M >> 11G| 0 0 |5.0 9.0 0 14 >> 25 0 74 0 0 0| 0 0 | 0 0 | 607 90 | 197M >> 11G| 0 0 |5.0 9.0 0 14 >> 25 0 75 0 0 0| 0 0 | 0 0 | 603 106 | 197M >> 11G| 0 0 |5.0 9.0 0 14 >> 25 0 75 0 0 0| 0 144k| 0 0 | 625 104 | 197M >> 11G| 0 7.00 |5.0 9.0 0 14 >> 85 3 9 2 0 0| 248k 92k| 0 0 |1441 887 | 197M >> 11G|33.0 7.00 |5.0 9.0 0 14 >> 32 1 65 2 0 0| 404k 636k| 0 0 | 999 337 | 197M >> 11G|38.0 96.0 |5.0 9.0 0 14 >> 25 0 75 0 0 0| 0 0 | 0 0 | 609 117 | 197M >> 11G| 0 0 |5.0 9.0 0 14 >> 25 0 75 0 0 0| 0 0 | 0 0 | 604 77 | 197M >> 11G| 0 0 |5.0 9.0 0 14 >> 26 0 74 0 0 0| 0 0 | 0 0 | 781 183 | 197M >> 11G| 0 0 |5.0 9.0 0 14 >> 25 0 75 0 0 0| 0 0 | 0 0 | 620 110 | 197M >> 11G| 0 0 |5.0 9.0 0 14 >> 46 4 50 0 0 0| 0 116k| 0 0 | 901 398 | 197M >> 11G| 0 12.0 |4.0 9.0 0 13 >> 50 2 47 0 0 0| 0 0 | 0 0 |1031 737 | 197M >> 11G| 0 0 |5.0 9.0 0 14 >> 28 1 71 0 0 0|4096B 168k| 0 0 | 800 254 | 197M >> 11G|1.00 9.00 |5.0 9.0 0 14 >> 25 0 75 0 0 0| 0 0 | 0 0 | 571 84 | 197M >> 11G| 0 0 |5.0 9.0 0 14 >> 26 0 73 1 0 0| 0 1172k| 0 0 | 632 209 | 197M >> 11G| 0 40.0 |5.0 9.0 0 14 >> >> >> For the short term, we should be fine if we put those single-document >> jobs in a queue that gets flushed every 60 seconds. >> >> Also, I should have mentioned that our index size is currently 27 GB >> containing 23.223.885 "documents" (only the PK is actually stored). For >> some reason I was assuming the commit time complexity to be constant, >> but that is probably not the case (?) >> >> Sooner or later someone is going to profile the container that runs Solr >> and our document streamer. I'll post the results if we find anything of >> interest. >> >> ================================= >> >> As a side note I've only just discovered that Solr 3.1 has been released >> (yaaaay!) We're currently using 1.4.1. >> >>> If you are on linux, I would recommend two tools you can use to track what >>> is going on on the machine, atop ( http://freshmeat.net/projects/atop/ ) >>> and dstat ( http://freshmeat.net/projects/dstat/ ). >>> >>> atop in particular has been very useful to me in tracking down performance >>> issues in real time (when I am running a process) or at random intervals >>> (when the machine slows down for no apparent reason. >>> >>> From the little you have told us my hunch is that you are saturating a disk >>> somewhere, either the index disk or swap (as pointed out by Mike) >>> >>> Cheers >>> >>> François >>> >>> On May 1, 2011, at 9:54 AM, Michael McCandless wrote: >>> >>>> Committing too frequently is very costly, since this calls fsync on >>>> numerous files under-the-hood, which strains the IO system and can cut >>>> into queries. If you really want to commit frequently, turning on compound >>>> file format could help things, since that's 1 file to fsync instead of N, >>>> per >>>> segment. >>>> >>>> Also, if you have a large merge running (turning on IW's infoStream >>>> will tell you), this can cause the OS to swap pages out, unless you >>>> set swappiness (if you're on Linux) to 0. >>>> >>>> Finally, beware of having too-large a JVM max heap; you may accumulate >>>> long-lived, uncollected garbage, which the OS may happily swap out >>>> (since the pages are never touched), which then kills performance when >>>> GC finally runs. I describe this here: >>>> http://blog.mikemccandless.com/2011/04/just-say-no-to-swapping.html >>>> It's good to leave some RAM for the OS to use as IO cache. >>>> >>>> Ideally, merging should not evict pages from the OS's buffer cache, >>>> but unfortunately the low-level IO flags to control this (eg >>>> fadvise/madvise) are not available in Java (I wrote about that here: >>>> http://blog.mikemccandless.com/2010/06/lucene-and-fadvisemadvise.html). >>>> >>>> However, we have a GCoC student this summer working on the problem >>>> (see https://issues.apache.org/jira/browse/LUCENE-2795), so after this >>>> is done we'll have a NativeUnixDirectory impl that hopefully prevents >>>> buffer cache eviction due to merging without you having to tweak >>>> swappiness settings. >>>> >>>> Mike >>>> >>>> http://blog.mikemccandless.com >>>> >>>> On Sat, Apr 30, 2011 at 9:23 PM, Craig Stires <craig.sti...@gmail.com> >>>> wrote: >>>>> Daniel, >>>>> >>>>> I've been able to post documents to Solr without degrading the performance >>>>> of search. But, I did have to make some changes to the solrconfig.xml >>>>> (ramBufferSize, mergeFactor, autoCommit, etc). >>>>> >>>>> What I found to be helpful was having a look at what was the causing the >>>>> OS >>>>> to grind. If your system is swapping too much to disk, you can check if >>>>> bumping up the ram (-Xms512m -Xmx1024m) alleviates it. Even if this isn't >>>>> the fix, you can at least isolate if it's a memory issue, or if your issue >>>>> is related to a disk I/O issue (e.g. running optimization on every >>>>> commit). >>>>> >>>>> >>>>> Also, is worth having a look in your logs to see if the server is having >>>>> complaints about memory or issues with your schema, or some other >>>>> unexpected >>>>> issue. >>>>> >>>>> A resource that has been helpful for me >>>>> http://wiki.apache.org/solr/SolrPerformanceFactors >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Daniel Huss [mailto:hussdl1985-solrus...@yahoo.de] >>>>> Sent: Sunday, 1 May 2011 5:35 AM >>>>> To: solr-user@lucene.apache.org >>>>> Subject: Searching performance suffers tremendously during indexing >>>>> >>>>> Hi everyone, >>>>> >>>>> our Solr-based search is unresponsive while documents are being indexed. >>>>> The documents to index (results of a DB query) are sent to Solr by a >>>>> daemon in batches of varying size. The number of documents per batch may >>>>> vary between one and several hundreds of thousands. >>>>> >>>>> Before investigating any further, I would like to ask if this can be >>>>> considered an issue at all. I was expecting Solr to handle concurrent >>>>> indexing/searching quite well, in fact this was one of the main reasons >>>>> for chosing Solr over the searching capabilities of our RDMS. >>>>> >>>>> Is searching performance *supposed* to drop while documents are being >>>>> indexed? >>>>> >>>>> >> > > -- Lance Norskog goks...@gmail.com