I think we sorely need a Directory impl that down-prioritizes IO performed by merging.
It would be wonderful if from Java we could simply set a per-thread "IO priority", but, it'll be a looong time until that's possible. So I think for now we should make a Directory impl that emulates such behavior, eg Lucene could state the "context" (merge, flush, search, nrt-reopen, etc.) whenever it opens an IndexInput / IndexOutput, and then the Directory could hack in pausing the merge IO whenever search/nrt-reopen IO is active. Mike On Thu, Nov 12, 2009 at 7:18 PM, Mark Miller <markrmil...@gmail.com> wrote: > Jerome L Quinn wrote: >> Hi, everyone, this is a problem I've had for quite a while, >> and have basically avoided optimizing because of it. However, >> eventually we will get to the point where we must delete as >> well as add docs continuously. >> >> I have a Solr 1.3 index with ~4M docs at around 90G. This is a single >> instance running inside tomcat 6, so no replication. Merge factor is the >> default 10. ramBufferSizeMB is 32. maxWarmingSearchers=4. >> autoCommit is set at 3 sec. >> >> We continually push new data into the index, at somewhere between 1-10 docs >> every 10 sec or so. Solr is running on a quad-core 3.0GHz server. >> under IBM java 1.6. The index is sitting on a local 15K scsi disk. >> There's nothing >> else of substance running on the box. >> >> Optimizing the index takes about 65 min. >> >> As long as I'm not optimizing, search and indexing times are satisfactory. >> >> When I start the optimize, I see massive problems with timeouts pushing new >> docs >> into the index, and search times balloon. A typical search while >> optimizing takes >> about 1 min instead of a few seconds. >> >> Can anyone offer me help with fixing the problem? >> >> Thanks, >> Jerry Quinn >> > Ah, the pains of optimization. Its kind of just how it is. One solution > is to use two boxes and replication - optimize on the master, and then > queries only hit the slave. Out of reach for some though, and adds many > complications. > > Another kind of option is to use the partial optimize feature: > > <optimize maxOptimizeSegments="5"/> > > Using this, you can optimize down to n segments and take a shorter hit > each time. > > Also, if optimizing is so painful, you might lower the merge factor > amortize that pain better. Thats another way to slowly get there - if > you lower the merge factor, as merging takes place, the new merge factor > will be respected, and semgents will merge down. A merge factor of 2 > (the lowest) will make it so you only ever have 2 segments. Sometimes > that works reasonably well - you could try 3-6 or something as well. > Then when you do your partial optimizes (and eventually a full optimize > perhaps), you want have so far to go. > > -- > - Mark > > http://www.lucidimagination.com > > > >