Joe: Serendipity strikes, The thread titled "JVM Heap Memory Increase (SOLR CLOUD)" is a perfect example of why the optimize button is so "fraught".
Best, Erick On Sat, Apr 21, 2018 at 9:43 AM, Erick Erickson <erickerick...@gmail.com> wrote: > Joe: > > Thanks for moving the conversation over here that we were having on > the blog post. I think the wider audience will benefit from this going > forward. > > bq: ...apparent inability to remove piles of deleted docs > > do note that deleted docs are removed during normal indexing when > segments are merged, they're not permanently retained in the index. > Part of the thinking behind SOLR-7733 is exactly that once you press > the very tempting optimize button, you can get into a situation where > your one huge segment does _not_ have the deleted docs removed until > the "live" document space is < 2.5G. Thus if you have a 100G segment > after optimize, it'll look like deleted docs are never removed until > at least 97.5% of the docs are deleted. The default max segment size > is 5G, and the current algorithm doesn't consider segments eligible > for merging until 50% of that maximum number consists of "live" docs. > > The optimize functionality in the admin UI was removed as part of > SOLR-7733 from the screen that comes up when you select a core, but > the "core admin" screen still has the optimize button that comes and > goes depending on whether there are any deleted documents or not. This > page is only visible in standalone mode. > > Unfortunately SOLR-7733 removed the functionality that actually sent > the optimize command from the javascript, so pressing the optimize > button does nothing. This is indeed a bug, see: SOLR-12253 which will > remove the button from the core admin screen in stand-alone mode. > > Optimize (aka forceMerge) is pretty actively discouraged because it is: > 1> very expensive > 2> has significant "gotchas" (we chatted in comments in the blog post > about the gotchas). > > So we made a decision to make it more of an 'expert' option, requiring > users to issue a curl/Browser URL command like > "....solr/core_or_collection/update?optimize=true" if this > functionality is really desirable in their situation. Docs will be > updated too, they're lagging a bit. > > Coming probably in Solr 7.4 is a new parameter (tentatively) for > TieredMergePolicy (TMP) that puts a soft ceiling on the percentage of > deleted docs in an index. The current version of this patch > (LUCENE-7976) sets this threshold at 20% at the expense of about 10% > more I/O in my tests from the current TMP implementation. Under > discussion is how low to allow this to be, we're thinking 10% as a > floor, and what the default should be. The current TMP caps the > percentage deleted docs at close to 50%. > > The thinking behind not allowing the percent deleted documents to be > too low is that that would trigger its own massive I/O issues, > rewriting "live" documents over and over and over. For NRT indexes, > that's almost certainly a horrible tradeoff. For more static indexes, > the "expert" API command is still available. > > Best, > Erick > > On Sat, Apr 21, 2018 at 5:08 AM, Joe Doupnik <j...@netlab1.net> wrote: >> In Solr v7.3.0 the ability to removed "deleted" docs from a core by use >> of what until then was the Optmise button on the admin GUI has been changed >> in an ungood way. That is, in the V7.3.0 Changes list, item SOLR 7733 (quote >> remove "optmize from the UI, end quote). The result of that is an apparent >> inability to remove piles of deleted docs, which amongst other things means >> wasting disk space. That is a marked step backward and is unhelpful for use >> of Solr in the field. As other comments in the now closed 7733 ticket >> explain, this is a user item whidh has impact on their site, and it ought to >> be an inherent feature of Solr. Consider a file system where complete >> deletes are forbidden, or your kitchen where taking out the rubbish is >> denied. Hand waving about obscure auto-sizing notions will not suffice. Thus >> may I urge that the Optimse button and operation be returned to use, as it >> was until Solr v7.3.0. >> Thanks, >> Joe D.