Re: Solr Cloud reclaiming disk space from deleted documents

Rishi Easwaran Mon, 20 Apr 2015 07:45:17 -0700

Yeah I noticed that. Looks like optimize won't work since on some disks we are 
already pretty full.
Any thoughts on increasing/decreasing <mergeFactor>10</mergeFactor>  or 
ConcurrentMergeScheduler to make solr do merges faster.



 

 

 

-----Original Message-----
From: Gili Nachum <gilinac...@gmail.com>
To: solr-user <solr-user@lucene.apache.org>
Sent: Sun, Apr 19, 2015 12:34 pm
Subject: Re: Solr Cloud reclaiming disk space from deleted documents


I assume you don't have much free space available in your disk. Notice
that
during optimization (merge into a single segment) your shard replica
space
usage may peak to 2x-3x of it's normal size until optimization
completes.
Is it a problem? Not if optimization occurs over shards serially and
your
index is broken to many small shards.
On Apr 18, 2015 1:54 AM, "Rishi
Easwaran" <rishi.easwa...@aol.com> wrote:

> Thanks Shawn for the quick
reply.
> Our indexes are running on SSD, so 3 should be ok.
> Any
recommendation on bumping it up?
>
> I guess will have to run optimize for
entire solr cloud and see if we can
> reclaim space.
>
> Thanks,
>
Rishi.
>
>
>
>
>
>
>
>
> -----Original Message-----
> From: Shawn
Heisey <apa...@elyograg.org>
> To: solr-user <solr-user@lucene.apache.org>
>
Sent: Fri, Apr 17, 2015 6:22 pm
> Subject: Re: Solr Cloud reclaiming disk space
from deleted documents
>
>
> On 4/17/2015 2:15 PM, Rishi Easwaran wrote:
> >
Running into an issue and wanted
> to see if anyone had some suggestions.
> >
We are seeing this with both solr 4.6
> and 4.10.3 code.
> > We are running an
extremely update heavy application, with
> millions of writes and deletes
happening to our indexes constantly.  An
> issue we
> are seeing is that solr
cloud reclaiming the disk space that can be used
> for new
> inserts, by
cleanup up deletes.
> >
> > We used to run optimize periodically with
> our
old multicore set up, not sure if that works for solr cloud.
> >
> > Num
>
Docs:28762340
> > Max Doc:48079586
> > Deleted Docs:19317246
> >
> >
Version
> 1429299216227
> > Gen 16525463
> > Size 109.92 GB
> >
> > In our
solrconfig.xml we
> use the following configs.
> >
> >     <indexConfig>
> >
<!-- Values here
> affect all index writers and act as a default unless
overridden. -->
> >
> <useCompoundFile>false</useCompoundFile>
> >
>
<maxBufferedDocs>1000</maxBufferedDocs>
> >
>
<maxMergeDocs>2147483647</maxMergeDocs>
> >
>
<maxFieldLength>10000</maxFieldLength>
> >
> >
>
<mergeFactor>10</mergeFactor>
> >         <mergePolicy
>
class="org.apache.lucene.index.TieredMergePolicy"/>
> >        
<mergeScheduler
> class="org.apache.lucene.index.ConcurrentMergeScheduler">
>
>             <int
> name="maxThreadCount">3</int>
> >             <int
>
name="maxMergeCount">15</int>
> >         </mergeScheduler>
> >
>
<ramBufferSizeMB>64</ramBufferSizeMB>
> >
> >     </indexConfig>
>
> This
>
part of my response won't help the issue you wrote about, but it
> can
affect
> performance, so I'm going to mention it.  If your indexes are
>
stored on regular
> spinning disks, reduce mergeScheduler/maxThreadCount
> to
1.  If they are stored
> on SSD, then a value of 3 is OK.  Spinning
> disks
cannot do seeks (read/write
> head moves) fast enough to handle
> multiple
merging threads properly.  All the
> seek activity required will
> really slow
down merging, which is a very bad thing
> when your indexing
> load is high. 
SSD disks do not have to seek, so multiple
> threads are OK
> there.
>
> An
optimize is the only way to reclaim all of the disk
> space held by
> deleted
documents.  Over time, as segments are merged
> automatically,
> deleted doc
space will be automatically recovered, but it won't
> be
> perfect, especially
as segments are merged multiple times into very
> large
> segments.
>
> If
you send an optimize command to a core/collection in SolrCloud,
> the
> entire
collection will be optimized ... the cloud will do one
> shard
> replica
(core) at a time until the entire collection has been
> optimized.
> There is
no way (currently) to ask it to only optimize a
> single core, or to do
>
multiple cores simultaneously, even if they are on
> different
>
servers.
>
> Thanks,
> Shawn
>
>
>
>

Re: Solr Cloud reclaiming disk space from deleted documents

Reply via email to