Joe:

Serendipity strikes, The thread titled "JVM Heap Memory Increase (SOLR
CLOUD)" is a perfect example of why the optimize button is so
"fraught".

Best,
Erick

On Sat, Apr 21, 2018 at 9:43 AM, Erick Erickson <erickerick...@gmail.com> wrote:
> Joe:
>
> Thanks for moving the conversation over here that we were having on
> the blog post. I think the wider audience will benefit from this going
> forward.
>
> bq: ...apparent inability to remove piles of deleted docs
>
> do note that deleted docs are removed during normal indexing when
> segments are merged, they're not permanently retained in the index.
> Part of the thinking behind SOLR-7733 is exactly that once you press
> the very tempting optimize button, you can get into a situation where
> your one huge segment does _not_ have the deleted docs removed until
> the "live" document space is < 2.5G. Thus if you have a 100G segment
> after optimize, it'll look like deleted docs are never removed until
> at least 97.5% of the docs are deleted. The default max segment size
> is 5G, and the current algorithm doesn't consider segments eligible
> for merging until 50% of that maximum number consists of "live" docs.
>
> The optimize functionality in the admin UI was removed as part of
> SOLR-7733 from the screen that comes up when you select a core, but
> the "core admin" screen still has the optimize button that comes and
> goes depending on whether there are any deleted documents or not. This
> page is only visible in standalone mode.
>
> Unfortunately SOLR-7733 removed the functionality that actually sent
> the optimize command from the javascript, so pressing the optimize
> button does nothing. This is indeed a bug, see: SOLR-12253 which will
> remove the button from the core admin screen in stand-alone mode.
>
> Optimize (aka forceMerge) is pretty actively discouraged because it is:
> 1> very expensive
> 2> has significant "gotchas" (we chatted in comments in the blog post
> about the gotchas).
>
> So we made a decision to make it more of an 'expert' option, requiring
> users to issue a curl/Browser URL command like
> "....solr/core_or_collection/update?optimize=true" if this
> functionality is really desirable in their situation. Docs will be
> updated too, they're lagging a bit.
>
> Coming probably in Solr 7.4 is a new parameter (tentatively) for
> TieredMergePolicy (TMP) that puts a soft ceiling on the percentage of
> deleted docs in an index. The current version of this patch
> (LUCENE-7976) sets this threshold at 20% at the expense of about 10%
> more I/O in my tests from the current TMP implementation. Under
> discussion is how low to allow this to be, we're thinking 10% as a
> floor, and what the default should be. The current TMP caps the
> percentage deleted docs at close to 50%.
>
> The thinking behind not allowing the percent deleted documents to be
> too low is that that would trigger its own massive I/O issues,
> rewriting "live" documents over and over and over. For NRT indexes,
> that's almost certainly a horrible tradeoff. For more static indexes,
> the "expert" API command is still available.
>
> Best,
> Erick
>
> On Sat, Apr 21, 2018 at 5:08 AM, Joe Doupnik <j...@netlab1.net> wrote:
>>     In Solr v7.3.0 the ability to removed "deleted" docs from a core by use
>> of what until then was the Optmise button on the admin GUI has been changed
>> in an ungood way. That is, in the V7.3.0 Changes list, item SOLR 7733 (quote
>> remove "optmize from the UI, end quote). The result of that is an apparent
>> inability to remove piles of deleted docs, which amongst other things means
>> wasting disk space. That is a marked step backward and is unhelpful for use
>> of Solr in the field. As other comments in the now closed 7733 ticket
>> explain, this is a user item whidh has impact on their site, and it ought to
>> be an inherent feature of Solr. Consider a file system where complete
>> deletes are forbidden, or your kitchen where taking out the rubbish is
>> denied. Hand waving about obscure auto-sizing notions will not suffice. Thus
>> may I urge that the Optimse button and operation be returned to use, as it
>> was until Solr v7.3.0.
>>     Thanks,
>>     Joe D.

Reply via email to