RE: NativeFSLockFactory, ConcurrentMergeScheduler: why locks?

Fuad Efendi Tue, 11 Aug 2009 14:07:48 -0700

Hi Jason,

I am using Master/Slave (two servers);
I monitored few hours today - 1 minute of document updates (about 100,000
documents) and then SOLR stops for at least 5 minutes to do background jobs
like RAM flush, segment merge...

Documents are small; about 10Gb of total index size for 50,000,000
documents.

I am suspecting "delete" is main bottleneck for Lucene since it marks
documents for deletion and then it needs to optimize inverted indexes (in
fact, to optimize)...

I run "update" queries to update documents, I have timestamp field and in
many cases I need to update timestamp only of existing document (specific
process periodically deletes expired documents, once a week) - but I am
still using out-of-the-box /update instead of implementing specific document
handler.

I can run it in a batch - for instance, collecting of millions of documents
somewhere and removing duplicates before sending to SOLR - but I prefer to
update document several times during a day - it's faster (although I
encountered a problem...)

Thanks,
Fuad

-----Original Message-----
From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] 
Sent: August-11-09 4:45 PM
To: solr-user@lucene.apache.org
Subject: Re: NativeFSLockFactory, ConcurrentMergeScheduler: why locks?

Fuad,

The lock indicates to external processes the index is in use, meaning
it's not cause ConcurrentMergeScheduler to block.

ConcurrentMergeScheduler does merge in it's own thread, however
if the merges are large then they can spike IO, CPU, and cause
the machine to be somewhat unresponsive.

What is the size of your index (in docs and GB)? How many
deletes are you performing? There are a few possible solutions
to these problems if you're able to separate athe updating from
the searching onto different servers.

-J

On Tue, Aug 11, 2009 at 10:08 AM, Fuad Efendi<f...@efendi.ca> wrote:
> 1.       I always have files lucene-xxxx-write.lock and
> lucene-xxxx-n-write.lock which I believe shouldn't be used with
> NativeFSLockFactory
>
> 2.       I use mergeFactor=100 and ramBufferSizeMB=256, few GB indes size.
I
> tried mergeFactor=10 and mergeFactor=1000.
>
>
>
>
>
> It seems ConcurrentMergeScheduler locks everything instead of using
separate
> thread on background...
>
>
>
>
>
> So that my configured system spents half an hour to UPDATE (probably
> existing in the index) million of documents, then it stops and waits few
> hours for index merge which is extremely slow (a lot of deletes?)
>
>
>
> With mergeFactor=1000 I had extremely performant index updates (50,000,000
a
> first day), and then I was waiting more than 2 days when merge complete
(and
> was forced to kill process).
>
>
>
> Why it locks everything?
>
>
>
> Thanks,
>
> Fuad
>
>
>
>

RE: NativeFSLockFactory, ConcurrentMergeScheduler: why locks?

Reply via email to