I assume you don't have much free space available in your disk. Notice that during optimization (merge into a single segment) your shard replica space usage may peak to 2x-3x of it's normal size until optimization completes. Is it a problem? Not if optimization occurs over shards serially and your index is broken to many small shards. On Apr 18, 2015 1:54 AM, "Rishi Easwaran" <rishi.easwa...@aol.com> wrote:
> Thanks Shawn for the quick reply. > Our indexes are running on SSD, so 3 should be ok. > Any recommendation on bumping it up? > > I guess will have to run optimize for entire solr cloud and see if we can > reclaim space. > > Thanks, > Rishi. > > > > > > > > > -----Original Message----- > From: Shawn Heisey <apa...@elyograg.org> > To: solr-user <solr-user@lucene.apache.org> > Sent: Fri, Apr 17, 2015 6:22 pm > Subject: Re: Solr Cloud reclaiming disk space from deleted documents > > > On 4/17/2015 2:15 PM, Rishi Easwaran wrote: > > Running into an issue and wanted > to see if anyone had some suggestions. > > We are seeing this with both solr 4.6 > and 4.10.3 code. > > We are running an extremely update heavy application, with > millions of writes and deletes happening to our indexes constantly. An > issue we > are seeing is that solr cloud reclaiming the disk space that can be used > for new > inserts, by cleanup up deletes. > > > > We used to run optimize periodically with > our old multicore set up, not sure if that works for solr cloud. > > > > Num > Docs:28762340 > > Max Doc:48079586 > > Deleted Docs:19317246 > > > > Version > 1429299216227 > > Gen 16525463 > > Size 109.92 GB > > > > In our solrconfig.xml we > use the following configs. > > > > <indexConfig> > > <!-- Values here > affect all index writers and act as a default unless overridden. --> > > > <useCompoundFile>false</useCompoundFile> > > > <maxBufferedDocs>1000</maxBufferedDocs> > > > <maxMergeDocs>2147483647</maxMergeDocs> > > > <maxFieldLength>10000</maxFieldLength> > > > > > <mergeFactor>10</mergeFactor> > > <mergePolicy > class="org.apache.lucene.index.TieredMergePolicy"/> > > <mergeScheduler > class="org.apache.lucene.index.ConcurrentMergeScheduler"> > > <int > name="maxThreadCount">3</int> > > <int > name="maxMergeCount">15</int> > > </mergeScheduler> > > > <ramBufferSizeMB>64</ramBufferSizeMB> > > > > </indexConfig> > > This > part of my response won't help the issue you wrote about, but it > can affect > performance, so I'm going to mention it. If your indexes are > stored on regular > spinning disks, reduce mergeScheduler/maxThreadCount > to 1. If they are stored > on SSD, then a value of 3 is OK. Spinning > disks cannot do seeks (read/write > head moves) fast enough to handle > multiple merging threads properly. All the > seek activity required will > really slow down merging, which is a very bad thing > when your indexing > load is high. SSD disks do not have to seek, so multiple > threads are OK > there. > > An optimize is the only way to reclaim all of the disk > space held by > deleted documents. Over time, as segments are merged > automatically, > deleted doc space will be automatically recovered, but it won't > be > perfect, especially as segments are merged multiple times into very > large > segments. > > If you send an optimize command to a core/collection in SolrCloud, > the > entire collection will be optimized ... the cloud will do one > shard > replica (core) at a time until the entire collection has been > optimized. > There is no way (currently) to ask it to only optimize a > single core, or to do > multiple cores simultaneously, even if they are on > different > servers. > > Thanks, > Shawn > > > >