Re: Solr 7.6 optimize index size increase

Raveendra Yerraguntla Wed, 17 Jun 2020 04:13:16 -0700

Thank you David, Walt , Eric.
1. First time bloated index generated , there is no disk space issue. one copy 
of index is 1/6 of disk capacity. we ran into disk capacity after more than 2  
copies of bloated copies.2. Solr is upgraded from 5.*. in 5.* more than 5 
segments is causing performance issue. Performance in 7.* is not measured for 
increasing segments. I will plan a PT to get optimum number. Application has 
incremental indexing multiple times in a work week.
I will keep you updated on the resolution.
Thanks again 
    On Tuesday, June 16, 2020, 07:34:26 PM EDT, Erick Erickson 
<erickerick...@gmail.com> wrote:  
 
 It Depends (tm).

As of Solr 7.5, optimize is different. See: 
https://lucidworks.com/post/solr-and-optimizing-your-index-take-ii/

So, assuming you have _not_ specified maxSegments=1, any very large
segment (near 5G) that has _zero_ deleted documents won’t be merged.

So there are two scenarios:

1> What Walter mentioned. The optimize process runs out of disk space
    and leaves lots of crud around

2> your “older segments” are just max-sized segments with zero deletions.

All that said… do you have demonstrable performance improvements after
optimizing? The entire name “optimize” is misleading, of course who
wouldn’t want an optimized index? In earlier versions of Solr (i.e. 4x),
it made quite a difference. In more recent Solr releases, it’s not as clear
cut. So before worrying about making optimize work, I’d recommend that
you do some performance tests on optimized and un-optimized indexes. 
If there are significant improvements, that’s one thing. Otherwise, it’s
a waste.

Best,
Erick

> On Jun 16, 2020, at 5:36 PM, Walter Underwood <wun...@wunderwood.org> wrote:
> 
> For a full forced merge (mistakenly named “optimize”), the worst case disk 
> space
> is 3X the size of the index. It is common to need 2X the size of the index.
> 
> When I worked on Ultraseek Server 20+ years ago, it had the same merge 
> behavior.
> I implemented a disk space check that would refuse to merge if there wasn’t 
> enough
> free space. It would log an error and send an email to the admin.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On Jun 16, 2020, at 1:58 PM, David Hastings <hastings.recurs...@gmail.com> 
>> wrote:
>> 
>> I cant give you a 100% true answer but ive experienced this, and what
>> "seemed" to happen to me was that the optimize would start, and that will
>> drive the size up by 3 fold, and if you out of disk space in the process
>> the optimize will quit since, it cant optimize, and leave the live index
>> pieces in tact, so now you have the "current" index as well as the
>> "optimized" fragments
>> 
>> i cant say for certain thats what you ran into, but we found that if you
>> get an expanding disk it will keep growing and prevent this from happening,
>> then the index will contract and the disk will shrink back to only what it
>> needs.  saved me a lot of headaches not needing to ever worry about disk
>> space
>> 
>> On Tue, Jun 16, 2020 at 4:43 PM Raveendra Yerraguntla
>> <raveend...@yahoo.com.invalid> wrote:
>> 
>>> 
>>> when optimize command is issued, the expectation after the completion of
>>> optimization process is that the index size either decreases or at most
>>> remain same. In solr 7.6 cluster with 50 plus shards, when optimize command
>>> is issued, some of the shard's transient or older segment files are not
>>> deleted. This is happening randomly across all shards. When unnoticed these
>>> transient files makes disk full. Currently it is handled through monitors,
>>> but question is what is causing the transient/older files remains there.
>>> Are there any specific race conditions which laves the older files not
>>> being deleted?
>>> Any pointers around this will be helpful.
>>> TIA
>

Re: Solr 7.6 optimize index size increase

Reply via email to