Fuad, I'd recommend indexing in Hadoop, then copying the new indexes to Solr slaves. This removes the need for Solr master servers. Of course you'd need a Hadoop cluster larger than the number of master servers you have now. The merge indexes command (which can be taxing on the servers because it performs a copy) could be used.
It would be good to improve Solr's integration with Hadoop as otherwise reindexing (such as for a schema change) becomes an onerous task -J On Tue, Aug 11, 2009 at 2:31 PM, Fuad Efendi<f...@efendi.ca> wrote: > Forgot to add: committing only once a day > > I tried mergeFactor=1000 and performance of index write was extremely good > (more than 50,000,000 updates during part of a day) > However, "commit" was taking 2 days or more and I simply killed process > (suspecting that it can break my harddrive); I had about 8000 files in index > that day... 3 minutes waiting until new small *.del file appear, and after > several thousands of such files I killed process. > > Most probably "delete" in Lucene... it needs rewrite inverted index (in > fact, to optimize)...? not sure > > > > -----Original Message----- > > Never tried profiling; > 3000-5000 docs per second if SOLR is not busy with segment merge; > > During segment merge 99% CPU, no disk swap; I can't suspect I/O... > > During document updates (small batches 100-1000 docs) only 5-15% CPU > > -server 2048Gb option of JVM (which is JRockit) + 256M for RAM Buffer > > I can't suspect garbage collection... I'll try to do the same with much > better hardware tomorrow (2 quad-core instead of single double-core, SCSI > RAID0 instead of single SAS, 16Gb for Tomcat instead of current 2Gb) but > constant rate 5:1 is very suspicious... > > > > -----Original Message----- > From: Grant Ingersoll > Sent: August-11-09 5:01 PM > > Have you tried profiling? How often are you committing? Have you > looked at Garbage Collection or any of the usual suspects like that? > > > On Aug 11, 2009, at 4:49 PM, Fuad Efendi wrote: > >> In a heavily loaded Write-only Master SOLR, I have 5 minutes of RAM >> Buffer >> Flash / Segment Merge per 1 minute of (heavy) batch document updates. > > Define heavy. How many docs per second? > > > > > > >