Joe: Thanks for moving the conversation over here that we were having on the blog post. I think the wider audience will benefit from this going forward.
bq: ...apparent inability to remove piles of deleted docs do note that deleted docs are removed during normal indexing when segments are merged, they're not permanently retained in the index. Part of the thinking behind SOLR-7733 is exactly that once you press the very tempting optimize button, you can get into a situation where your one huge segment does _not_ have the deleted docs removed until the "live" document space is < 2.5G. Thus if you have a 100G segment after optimize, it'll look like deleted docs are never removed until at least 97.5% of the docs are deleted. The default max segment size is 5G, and the current algorithm doesn't consider segments eligible for merging until 50% of that maximum number consists of "live" docs. The optimize functionality in the admin UI was removed as part of SOLR-7733 from the screen that comes up when you select a core, but the "core admin" screen still has the optimize button that comes and goes depending on whether there are any deleted documents or not. This page is only visible in standalone mode. Unfortunately SOLR-7733 removed the functionality that actually sent the optimize command from the javascript, so pressing the optimize button does nothing. This is indeed a bug, see: SOLR-12253 which will remove the button from the core admin screen in stand-alone mode. Optimize (aka forceMerge) is pretty actively discouraged because it is: 1> very expensive 2> has significant "gotchas" (we chatted in comments in the blog post about the gotchas). So we made a decision to make it more of an 'expert' option, requiring users to issue a curl/Browser URL command like "....solr/core_or_collection/update?optimize=true" if this functionality is really desirable in their situation. Docs will be updated too, they're lagging a bit. Coming probably in Solr 7.4 is a new parameter (tentatively) for TieredMergePolicy (TMP) that puts a soft ceiling on the percentage of deleted docs in an index. The current version of this patch (LUCENE-7976) sets this threshold at 20% at the expense of about 10% more I/O in my tests from the current TMP implementation. Under discussion is how low to allow this to be, we're thinking 10% as a floor, and what the default should be. The current TMP caps the percentage deleted docs at close to 50%. The thinking behind not allowing the percent deleted documents to be too low is that that would trigger its own massive I/O issues, rewriting "live" documents over and over and over. For NRT indexes, that's almost certainly a horrible tradeoff. For more static indexes, the "expert" API command is still available. Best, Erick On Sat, Apr 21, 2018 at 5:08 AM, Joe Doupnik <j...@netlab1.net> wrote: > In Solr v7.3.0 the ability to removed "deleted" docs from a core by use > of what until then was the Optmise button on the admin GUI has been changed > in an ungood way. That is, in the V7.3.0 Changes list, item SOLR 7733 (quote > remove "optmize from the UI, end quote). The result of that is an apparent > inability to remove piles of deleted docs, which amongst other things means > wasting disk space. That is a marked step backward and is unhelpful for use > of Solr in the field. As other comments in the now closed 7733 ticket > explain, this is a user item whidh has impact on their site, and it ought to > be an inherent feature of Solr. Consider a file system where complete > deletes are forbidden, or your kitchen where taking out the rubbish is > denied. Hand waving about obscure auto-sizing notions will not suffice. Thus > may I urge that the Optimse button and operation be returned to use, as it > was until Solr v7.3.0. > Thanks, > Joe D.