Re: Solr Cloud reclaiming disk space from deleted documents

Gili Nachum Sun, 19 Apr 2015 09:35:19 -0700

I assume you don't have much free space available in your disk. Notice that
during optimization (merge into a single segment) your shard replica space
usage may peak to 2x-3x of it's normal size until optimization completes.
Is it a problem? Not if optimization occurs over shards serially and your
index is broken to many small shards.
On Apr 18, 2015 1:54 AM, "Rishi Easwaran" <rishi.easwa...@aol.com> wrote:


> Thanks Shawn for the quick reply.
> Our indexes are running on SSD, so 3 should be ok.
> Any recommendation on bumping it up?
>
> I guess will have to run optimize for entire solr cloud and see if we can
> reclaim space.
>
> Thanks,
> Rishi.
>
>
>
>
>
>
>
>
> -----Original Message-----
> From: Shawn Heisey <apa...@elyograg.org>
> To: solr-user <solr-user@lucene.apache.org>
> Sent: Fri, Apr 17, 2015 6:22 pm
> Subject: Re: Solr Cloud reclaiming disk space from deleted documents
>
>
> On 4/17/2015 2:15 PM, Rishi Easwaran wrote:
> > Running into an issue and wanted
> to see if anyone had some suggestions.
> > We are seeing this with both solr 4.6
> and 4.10.3 code.
> > We are running an extremely update heavy application, with
> millions of writes and deletes happening to our indexes constantly.  An
> issue we
> are seeing is that solr cloud reclaiming the disk space that can be used
> for new
> inserts, by cleanup up deletes.
> >
> > We used to run optimize periodically with
> our old multicore set up, not sure if that works for solr cloud.
> >
> > Num
> Docs:28762340
> > Max Doc:48079586
> > Deleted Docs:19317246
> >
> > Version
> 1429299216227
> > Gen 16525463
> > Size 109.92 GB
> >
> > In our solrconfig.xml we
> use the following configs.
> >
> >     <indexConfig>
> >     <!-- Values here
> affect all index writers and act as a default unless overridden. -->
> >
> <useCompoundFile>false</useCompoundFile>
> >
> <maxBufferedDocs>1000</maxBufferedDocs>
> >
> <maxMergeDocs>2147483647</maxMergeDocs>
> >
> <maxFieldLength>10000</maxFieldLength>
> >
> >
> <mergeFactor>10</mergeFactor>
> >         <mergePolicy
> class="org.apache.lucene.index.TieredMergePolicy"/>
> >         <mergeScheduler
> class="org.apache.lucene.index.ConcurrentMergeScheduler">
> >             <int
> name="maxThreadCount">3</int>
> >             <int
> name="maxMergeCount">15</int>
> >         </mergeScheduler>
> >
> <ramBufferSizeMB>64</ramBufferSizeMB>
> >
> >     </indexConfig>
>
> This
> part of my response won't help the issue you wrote about, but it
> can affect
> performance, so I'm going to mention it.  If your indexes are
> stored on regular
> spinning disks, reduce mergeScheduler/maxThreadCount
> to 1.  If they are stored
> on SSD, then a value of 3 is OK.  Spinning
> disks cannot do seeks (read/write
> head moves) fast enough to handle
> multiple merging threads properly.  All the
> seek activity required will
> really slow down merging, which is a very bad thing
> when your indexing
> load is high.  SSD disks do not have to seek, so multiple
> threads are OK
> there.
>
> An optimize is the only way to reclaim all of the disk
> space held by
> deleted documents.  Over time, as segments are merged
> automatically,
> deleted doc space will be automatically recovered, but it won't
> be
> perfect, especially as segments are merged multiple times into very
> large
> segments.
>
> If you send an optimize command to a core/collection in SolrCloud,
> the
> entire collection will be optimized ... the cloud will do one
> shard
> replica (core) at a time until the entire collection has been
> optimized.
> There is no way (currently) to ask it to only optimize a
> single core, or to do
> multiple cores simultaneously, even if they are on
> different
> servers.
>
> Thanks,
> Shawn
>
>
>
>

Re: Solr Cloud reclaiming disk space from deleted documents

Reply via email to