Joe:

Thanks for moving the conversation over here that we were having on
the blog post. I think the wider audience will benefit from this going
forward.

bq: ...apparent inability to remove piles of deleted docs

do note that deleted docs are removed during normal indexing when
segments are merged, they're not permanently retained in the index.
Part of the thinking behind SOLR-7733 is exactly that once you press
the very tempting optimize button, you can get into a situation where
your one huge segment does _not_ have the deleted docs removed until
the "live" document space is < 2.5G. Thus if you have a 100G segment
after optimize, it'll look like deleted docs are never removed until
at least 97.5% of the docs are deleted. The default max segment size
is 5G, and the current algorithm doesn't consider segments eligible
for merging until 50% of that maximum number consists of "live" docs.

The optimize functionality in the admin UI was removed as part of
SOLR-7733 from the screen that comes up when you select a core, but
the "core admin" screen still has the optimize button that comes and
goes depending on whether there are any deleted documents or not. This
page is only visible in standalone mode.

Unfortunately SOLR-7733 removed the functionality that actually sent
the optimize command from the javascript, so pressing the optimize
button does nothing. This is indeed a bug, see: SOLR-12253 which will
remove the button from the core admin screen in stand-alone mode.

Optimize (aka forceMerge) is pretty actively discouraged because it is:
1> very expensive
2> has significant "gotchas" (we chatted in comments in the blog post
about the gotchas).

So we made a decision to make it more of an 'expert' option, requiring
users to issue a curl/Browser URL command like
"....solr/core_or_collection/update?optimize=true" if this
functionality is really desirable in their situation. Docs will be
updated too, they're lagging a bit.

Coming probably in Solr 7.4 is a new parameter (tentatively) for
TieredMergePolicy (TMP) that puts a soft ceiling on the percentage of
deleted docs in an index. The current version of this patch
(LUCENE-7976) sets this threshold at 20% at the expense of about 10%
more I/O in my tests from the current TMP implementation. Under
discussion is how low to allow this to be, we're thinking 10% as a
floor, and what the default should be. The current TMP caps the
percentage deleted docs at close to 50%.

The thinking behind not allowing the percent deleted documents to be
too low is that that would trigger its own massive I/O issues,
rewriting "live" documents over and over and over. For NRT indexes,
that's almost certainly a horrible tradeoff. For more static indexes,
the "expert" API command is still available.

Best,
Erick

On Sat, Apr 21, 2018 at 5:08 AM, Joe Doupnik <j...@netlab1.net> wrote:
>     In Solr v7.3.0 the ability to removed "deleted" docs from a core by use
> of what until then was the Optmise button on the admin GUI has been changed
> in an ungood way. That is, in the V7.3.0 Changes list, item SOLR 7733 (quote
> remove "optmize from the UI, end quote). The result of that is an apparent
> inability to remove piles of deleted docs, which amongst other things means
> wasting disk space. That is a marked step backward and is unhelpful for use
> of Solr in the field. As other comments in the now closed 7733 ticket
> explain, this is a user item whidh has impact on their site, and it ought to
> be an inherent feature of Solr. Consider a file system where complete
> deletes are forbidden, or your kitchen where taking out the rubbish is
> denied. Hand waving about obscure auto-sizing notions will not suffice. Thus
> may I urge that the Optimse button and operation be returned to use, as it
> was until Solr v7.3.0.
>     Thanks,
>     Joe D.

Reply via email to