Hi Jason,

After moving to more RAM and CPUs and setting ramBufferSizeMB=8192 problem
disappeared; I had 100 mlns documents added in 24 hours almost without any
index merge (mergeFactor=10). Lucene flushes to disk the segment when RAM
buffer is full; then MergePolicy orchestrates...

However, 500Gb Seagate SATA got quickly broken on SuSE Linux 10 & Tyan
Thunder motherboard :((( - when SOLR tried to merge 2 segments, about
10Gb... I reinstalled SLES and started again; I ordered SAS RAID Adaptec &
Seagate Cheetah 15K.5 SAS

I am wondering how one can run Nutch on SATA (if Nutch is fast enough)... I
had constant problems with Oracle block corruption on Seagate Barracuda SATA
several years ago, then moved to Cheetah...

Good SCSI controller (with dedicated CPU and cache!!!) + Cheetah 15K.5 (with
16Mb cache!!!) - and we don't need to flush 8Kb if we changed few hundred
bytes only... it's not easy to assemble good "commodity" hardware from
parts...

I am going to use Hadoop for pre-data-mining before indexing with SOLR; I
use currently mix of MySQL & HBase...

Thanks for the input!



-----Original Message-----
From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] 
Sent: August-17-09 1:45 PM
To: solr-user@lucene.apache.org
Subject: Re: Performance Tuning: segment_merge:index_update=5:1 (timing)

Fuad,

I'd recommend indexing in Hadoop, then copying the new indexes to Solr
slaves.  This removes the need for Solr master servers.  Of course
you'd need a Hadoop cluster larger than the number of master servers
you have now.  The merge indexes command (which can be taxing on the
servers because it performs a copy) could be used.

It would be good to improve Solr's integration with Hadoop as
otherwise reindexing (such as for a schema change) becomes an onerous
task

-J

On Tue, Aug 11, 2009 at 2:31 PM, Fuad Efendi<f...@efendi.ca> wrote:
> Forgot to add: committing only once a day
>
> I tried mergeFactor=1000 and performance of index write was extremely good
> (more than 50,000,000 updates during part of a day)
> However, "commit" was taking 2 days or more and I simply killed process
> (suspecting that it can break my harddrive); I had about 8000 files in
index
> that day... 3 minutes waiting until new small *.del file appear, and after
> several thousands of such files I killed process.
>
> Most probably "delete" in Lucene... it needs rewrite inverted index (in
> fact, to optimize)...? not sure
>
>
>
> -----Original Message-----
>
> Never tried profiling;
> 3000-5000 docs per second if SOLR is not busy with segment merge;
>
> During segment merge 99% CPU, no disk swap; I can't suspect I/O...
>
> During document updates (small batches 100-1000 docs) only 5-15% CPU
>
> -server 2048Gb option of JVM (which is JRockit) + 256M for RAM Buffer
>
> I can't suspect garbage collection... I'll try to do the same with much
> better hardware tomorrow (2 quad-core instead of single double-core, SCSI
> RAID0 instead of single SAS, 16Gb for Tomcat instead of current 2Gb) but
> constant rate 5:1 is very suspicious...
>
>
>
> -----Original Message-----
> From: Grant Ingersoll
> Sent: August-11-09 5:01 PM
>
> Have you tried profiling?  How often are you committing?  Have you
> looked at Garbage Collection or any of the usual suspects like that?
>
>
> On Aug 11, 2009, at 4:49 PM, Fuad Efendi wrote:
>
>> In a heavily loaded Write-only Master SOLR, I have 5 minutes of RAM
>> Buffer
>> Flash / Segment Merge per 1 minute of (heavy) batch document updates.
>
> Define heavy.  How many docs per second?
>
>
>
>
>
>
>


Reply via email to