Optimization merges index to a single segment (one huge file), so entire index will be copied on replication. So you really do need 2x disk in some cases then.
Do you really need to optimize? We have a pretty big total index (about 200 million docs) and we never optimize. But we do have a sharded index so our largest indexes are only around 10 million docs. We have merge factor of 2. We run replication every minute. In our tests search performance was not very much better with optimization, but that may be specific to our types of searches, etc. You may have different results. Bob On Nov 1, 2011, at 12:46 AM, Jason Biggin wrote: > Wondering if anyone has experience with replicating large indexes. We have a > Solr deployment with 1 master, 1 master/slave and 5 slaves. Our index > contains 15+ million articles and is ~55GB in size. > > Performance is great on all systems. > > Debian Linux > Apache-Tomcat > 100GB disk > 6GB RAM > 2 proc > > on VMWare ESXi 4.0 > > > We notice however that whenever the master is optimized, the complete index > is replicated to the slaves. This causes a 100%+ bloat in disk requirements. > > Is this normal? Is there a way around this? > > Currently our optimize is configured as such: > > curl > 'http://localhost:8080/solr/update?optimize=true&maxSegments=1&waitFlush=true&expungeDeletes=true' > > Willing to share our experiences with Solr. > > Thanks, > Jason