What Walter said. Although with Solr 7.6, unless you specify maxSegments explicitly, you won’t create segments over the default 5G maximum.
And if you have in the past specified maxSegments so you have segments over 5G, optimize (again without specifying maxSegments) will do a “singleton merge” on them, i.e. it’ll rewrite each large segment into a single new segment with all the deleted data removed thus gradually shrinking it. This happens automatically if you delete documents (update is a delete + add so counts), but you may have a significant percentage of deleted docs in your index.. Best, Erick > On Jun 17, 2020, at 12:39 PM, Walter Underwood <wun...@wunderwood.org> wrote: > > From that short description, you should not be running optimize at all. > > Just stop doing it. It doesn’t make that big a difference. > > It may take your indexes a few weeks to get back to a normal state after the > forced merges. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > >> On Jun 17, 2020, at 4:12 AM, Raveendra Yerraguntla >> <raveend...@yahoo.com.INVALID> wrote: >> >> Thank you David, Walt , Eric. >> 1. First time bloated index generated , there is no disk space issue. one >> copy of index is 1/6 of disk capacity. we ran into disk capacity after more >> than 2 copies of bloated copies.2. Solr is upgraded from 5.*. in 5.* more >> than 5 segments is causing performance issue. Performance in 7.* is not >> measured for increasing segments. I will plan a PT to get optimum number. >> Application has incremental indexing multiple times in a work week. >> I will keep you updated on the resolution. >> Thanks again >> On Tuesday, June 16, 2020, 07:34:26 PM EDT, Erick Erickson >> <erickerick...@gmail.com> wrote: >> >> It Depends (tm). >> >> As of Solr 7.5, optimize is different. See: >> https://lucidworks.com/post/solr-and-optimizing-your-index-take-ii/ >> >> So, assuming you have _not_ specified maxSegments=1, any very large >> segment (near 5G) that has _zero_ deleted documents won’t be merged. >> >> So there are two scenarios: >> >> 1> What Walter mentioned. The optimize process runs out of disk space >> and leaves lots of crud around >> >> 2> your “older segments” are just max-sized segments with zero deletions. >> >> >> All that said… do you have demonstrable performance improvements after >> optimizing? The entire name “optimize” is misleading, of course who >> wouldn’t want an optimized index? In earlier versions of Solr (i.e. 4x), >> it made quite a difference. In more recent Solr releases, it’s not as clear >> cut. So before worrying about making optimize work, I’d recommend that >> you do some performance tests on optimized and un-optimized indexes. >> If there are significant improvements, that’s one thing. Otherwise, it’s >> a waste. >> >> Best, >> Erick >> >>> On Jun 16, 2020, at 5:36 PM, Walter Underwood <wun...@wunderwood.org> wrote: >>> >>> For a full forced merge (mistakenly named “optimize”), the worst case disk >>> space >>> is 3X the size of the index. It is common to need 2X the size of the index. >>> >>> When I worked on Ultraseek Server 20+ years ago, it had the same merge >>> behavior. >>> I implemented a disk space check that would refuse to merge if there wasn’t >>> enough >>> free space. It would log an error and send an email to the admin. >>> >>> wunder >>> Walter Underwood >>> wun...@wunderwood.org >>> http://observer.wunderwood.org/ (my blog) >>> >>>> On Jun 16, 2020, at 1:58 PM, David Hastings <hastings.recurs...@gmail.com> >>>> wrote: >>>> >>>> I cant give you a 100% true answer but ive experienced this, and what >>>> "seemed" to happen to me was that the optimize would start, and that will >>>> drive the size up by 3 fold, and if you out of disk space in the process >>>> the optimize will quit since, it cant optimize, and leave the live index >>>> pieces in tact, so now you have the "current" index as well as the >>>> "optimized" fragments >>>> >>>> i cant say for certain thats what you ran into, but we found that if you >>>> get an expanding disk it will keep growing and prevent this from happening, >>>> then the index will contract and the disk will shrink back to only what it >>>> needs. saved me a lot of headaches not needing to ever worry about disk >>>> space >>>> >>>> On Tue, Jun 16, 2020 at 4:43 PM Raveendra Yerraguntla >>>> <raveend...@yahoo.com.invalid> wrote: >>>> >>>>> >>>>> when optimize command is issued, the expectation after the completion of >>>>> optimization process is that the index size either decreases or at most >>>>> remain same. In solr 7.6 cluster with 50 plus shards, when optimize >>>>> command >>>>> is issued, some of the shard's transient or older segment files are not >>>>> deleted. This is happening randomly across all shards. When unnoticed >>>>> these >>>>> transient files makes disk full. Currently it is handled through monitors, >>>>> but question is what is causing the transient/older files remains there. >>>>> Are there any specific race conditions which laves the older files not >>>>> being deleted? >>>>> Any pointers around this will be helpful. >>>>> TIA >>> >