Hi Jason, I am using Master/Slave (two servers); I monitored few hours today - 1 minute of document updates (about 100,000 documents) and then SOLR stops for at least 5 minutes to do background jobs like RAM flush, segment merge...
Documents are small; about 10Gb of total index size for 50,000,000 documents. I am suspecting "delete" is main bottleneck for Lucene since it marks documents for deletion and then it needs to optimize inverted indexes (in fact, to optimize)... I run "update" queries to update documents, I have timestamp field and in many cases I need to update timestamp only of existing document (specific process periodically deletes expired documents, once a week) - but I am still using out-of-the-box /update instead of implementing specific document handler. I can run it in a batch - for instance, collecting of millions of documents somewhere and removing duplicates before sending to SOLR - but I prefer to update document several times during a day - it's faster (although I encountered a problem...) Thanks, Fuad -----Original Message----- From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] Sent: August-11-09 4:45 PM To: solr-user@lucene.apache.org Subject: Re: NativeFSLockFactory, ConcurrentMergeScheduler: why locks? Fuad, The lock indicates to external processes the index is in use, meaning it's not cause ConcurrentMergeScheduler to block. ConcurrentMergeScheduler does merge in it's own thread, however if the merges are large then they can spike IO, CPU, and cause the machine to be somewhat unresponsive. What is the size of your index (in docs and GB)? How many deletes are you performing? There are a few possible solutions to these problems if you're able to separate athe updating from the searching onto different servers. -J On Tue, Aug 11, 2009 at 10:08 AM, Fuad Efendi<f...@efendi.ca> wrote: > 1. I always have files lucene-xxxx-write.lock and > lucene-xxxx-n-write.lock which I believe shouldn't be used with > NativeFSLockFactory > > 2. I use mergeFactor=100 and ramBufferSizeMB=256, few GB indes size. I > tried mergeFactor=10 and mergeFactor=1000. > > > > > > It seems ConcurrentMergeScheduler locks everything instead of using separate > thread on background... > > > > > > So that my configured system spents half an hour to UPDATE (probably > existing in the index) million of documents, then it stops and waits few > hours for index merge which is extremely slow (a lot of deletes?) > > > > With mergeFactor=1000 I had extremely performant index updates (50,000,000 a > first day), and then I was waiting more than 2 days when merge complete (and > was forced to kill process). > > > > Why it locks everything? > > > > Thanks, > > Fuad > > > >