Re: Solr Cloud reclaiming disk space from deleted documents

Rishi Easwaran Fri, 17 Apr 2015 15:55:01 -0700

Thanks Shawn for the quick reply.
Our indexes are running on SSD, so 3 should be ok.
Any recommendation on bumping it up?

I guess will have to run optimize for entire solr cloud and see if we can 
reclaim space.

Thanks,
Rishi. 

-----Original Message-----
From: Shawn Heisey <apa...@elyograg.org>
To: solr-user <solr-user@lucene.apache.org>
Sent: Fri, Apr 17, 2015 6:22 pm
Subject: Re: Solr Cloud reclaiming disk space from deleted documents

On 4/17/2015 2:15 PM, Rishi Easwaran wrote:
> Running into an issue and wanted
to see if anyone had some suggestions.
> We are seeing this with both solr 4.6
and 4.10.3 code.
> We are running an extremely update heavy application, with
millions of writes and deletes happening to our indexes constantly.  An issue we
are seeing is that solr cloud reclaiming the disk space that can be used for new
inserts, by cleanup up deletes. 
>
> We used to run optimize periodically with
our old multicore set up, not sure if that works for solr cloud.
>
> Num
Docs:28762340
> Max Doc:48079586
> Deleted Docs:19317246
>
> Version
1429299216227
> Gen 16525463
> Size 109.92 GB
>
> In our solrconfig.xml we
use the following configs.
>
>     <indexConfig>
>     <!-- Values here
affect all index writers and act as a default unless overridden. -->
>        
<useCompoundFile>false</useCompoundFile>
>        
<maxBufferedDocs>1000</maxBufferedDocs>
>        
<maxMergeDocs>2147483647</maxMergeDocs>
>        
<maxFieldLength>10000</maxFieldLength>
>
>        
<mergeFactor>10</mergeFactor>
>         <mergePolicy
class="org.apache.lucene.index.TieredMergePolicy"/>
>         <mergeScheduler
class="org.apache.lucene.index.ConcurrentMergeScheduler">
>             <int
name="maxThreadCount">3</int>
>             <int
name="maxMergeCount">15</int>
>         </mergeScheduler>
>        
<ramBufferSizeMB>64</ramBufferSizeMB>
>         
>     </indexConfig>

This
part of my response won't help the issue you wrote about, but it
can affect
performance, so I'm going to mention it.  If your indexes are
stored on regular
spinning disks, reduce mergeScheduler/maxThreadCount
to 1.  If they are stored
on SSD, then a value of 3 is OK.  Spinning
disks cannot do seeks (read/write
head moves) fast enough to handle
multiple merging threads properly.  All the
seek activity required will
really slow down merging, which is a very bad thing
when your indexing
load is high.  SSD disks do not have to seek, so multiple
threads are OK
there.

An optimize is the only way to reclaim all of the disk
space held by
deleted documents.  Over time, as segments are merged
automatically,
deleted doc space will be automatically recovered, but it won't
be
perfect, especially as segments are merged multiple times into very
large
segments.

If you send an optimize command to a core/collection in SolrCloud,
the
entire collection will be optimized ... the cloud will do one
shard
replica (core) at a time until the entire collection has been
optimized.
There is no way (currently) to ask it to only optimize a
single core, or to do
multiple cores simultaneously, even if they are on
different
servers.

Thanks,
Shawn

Re: Solr Cloud reclaiming disk space from deleted documents

Reply via email to