I think we sorely need a Directory impl that down-prioritizes IO
performed by merging.

It would be wonderful if from Java we could simply set a per-thread
"IO priority", but, it'll be a looong time until that's possible.

So I think for now we should make a Directory impl that emulates such
behavior, eg Lucene could state the "context" (merge, flush, search,
nrt-reopen, etc.) whenever it opens an IndexInput / IndexOutput, and
then the Directory could hack in pausing the merge IO whenever
search/nrt-reopen IO is active.

Mike

On Thu, Nov 12, 2009 at 7:18 PM, Mark Miller <markrmil...@gmail.com> wrote:
> Jerome L Quinn wrote:
>> Hi, everyone, this is a problem I've had for quite a while,
>> and have basically avoided optimizing because of it.  However,
>> eventually we will get to the point where we must delete as
>> well as add docs continuously.
>>
>> I have a Solr 1.3 index with ~4M docs at around 90G.  This is a single
>> instance running inside tomcat 6, so no replication.  Merge factor is the
>> default 10.  ramBufferSizeMB is 32.  maxWarmingSearchers=4.
>> autoCommit is set at 3 sec.
>>
>> We continually push new data into the index, at somewhere between 1-10 docs
>> every 10 sec or so.  Solr is running on a quad-core 3.0GHz server.
>> under IBM java 1.6.  The index is sitting on a local 15K scsi disk.
>> There's nothing
>> else of substance running on the box.
>>
>> Optimizing the index takes about 65 min.
>>
>> As long as I'm not optimizing, search and indexing times are satisfactory.
>>
>> When I start the optimize, I see massive problems with timeouts pushing new
>> docs
>> into the index, and search times balloon.  A typical search while
>> optimizing takes
>> about 1 min instead of a few seconds.
>>
>> Can anyone offer me help with fixing the problem?
>>
>> Thanks,
>> Jerry Quinn
>>
> Ah, the pains of optimization. Its kind of just how it is. One solution
> is to use two boxes and replication - optimize on the master, and then
> queries only hit the slave. Out of reach for some though, and adds many
> complications.
>
> Another kind of option is to use the partial optimize feature:
>
>  <optimize maxOptimizeSegments="5"/>
>
> Using this, you can optimize down to n segments and take a shorter hit
> each time.
>
> Also, if optimizing is so painful, you might lower the merge factor
> amortize that pain better. Thats another way to slowly get there - if
> you lower the merge factor, as merging takes place, the new merge factor
> will be respected, and semgents will merge down. A merge factor of 2
> (the lowest) will make it so you only ever have 2 segments. Sometimes
> that works reasonably well - you could try 3-6 or something as well.
> Then when you do your partial optimizes (and eventually a full optimize
> perhaps), you want have so far to go.
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>

Reply via email to