Optimization merges index to a single segment (one huge file), so entire index 
will be copied on replication.  So you really do need 2x disk in some cases 
then.

Do you really need to optimize?  We have a pretty big total index (about 200 
million docs) and we never optimize.  But we do have a sharded index so our 
largest indexes are only around 10 million docs.  We have merge factor of 2.  
We run replication every minute. 

In our tests search performance was not very much better with optimization, but 
that may be specific to our types of searches, etc.  You may have different 
results.

Bob

On Nov 1, 2011, at 12:46 AM, Jason Biggin wrote:

> Wondering if anyone has experience with replicating large indexes.  We have a 
> Solr deployment with 1 master, 1 master/slave and 5 slaves.  Our index 
> contains 15+ million articles and is ~55GB in size.
> 
> Performance is great on all systems.
> 
> Debian Linux
> Apache-Tomcat
> 100GB disk
> 6GB RAM
> 2 proc
> 
> on VMWare ESXi 4.0
> 
> 
> We notice however that whenever the master is optimized, the complete index 
> is replicated to the slaves.  This causes a 100%+ bloat in disk requirements.
> 
> Is this normal?  Is there a way around this?
> 
> Currently our optimize is configured as such:
> 
>       curl 
> 'http://localhost:8080/solr/update?optimize=true&maxSegments=1&waitFlush=true&expungeDeletes=true'
> 
> Willing to share our experiences with Solr.
> 
> Thanks,
> Jason

Reply via email to