Yeah, trying to have something that satisfies all use cases is a bear.
I know of one installation where the indexing rate was so huge that
they couldn't afford to have any merging (80B docs/day) so in that
situation any heuristics built into Solr would be wrong.

Here's an alternate approach to having buttons where you have to
attend to it each day:

http://localhost:8983/solr/admin/cores?action=STATUS

returns each core and the number of docs, maxdocs, and deleted docs.
One could set up a cron job that runs every night at 3:00 am that then
sends the optimize command to any core with greater than X% deleted
docs, where X is your locally-determined threshold. That would be less
work actually than having to attend to it every day.

FWIW

On Sat, Apr 21, 2018 at 10:55 AM, Joe Doupnik <j...@netlab1.net> wrote:
>     A good find Erick, and one which brings into focus the real problem at
> hand. That overload case would happen if there were an Optimise button or if
> the curl equivalent command were issued, and is not a reason to avoid
> either/both.
>     So, what could be done to avoid such awkward difficulties?
>     Well, an obvious suggestion, without knowing the details, is might the
> system be able to estimate internal conditions sufficiently to issue a
> warning and decline an Optimise. Certainly average system managers are not
> about to decode and monitor Java VM nuances.
>     Discussion about automating removals based on sizes of this and that
> seem, from this distance, to be musings yet to face the real world. In the
> meanwhile we need to control matters, hence the button request.
>     The resource consumption issue is inherent in such systems, and we in
> the field have very little information to help make choices. I know, it's
> not a simple affair, and too many buzz words fly about. Thus the engineers
> close to the code might have a ponder about the above predictive capability
> and about the overall resource consumption process which might permit the
> system to adapt to progressively larger loads over time.
>     In my own situation I feed material into Solr a file at a time, give a
> small pause, repeat, get to 100 entries and wait a bit longer, and so on
> every file, hundred files, thousand files. This works well to reduce
> resource peaks and uncompleted operations, and it lets the system run in the
> background all day if necessary without disturbing main activities. My
> longest run was over a full day, 660+K documents which worked just fine and
> did not upset other activities in the machine.
>     Thanks,
>     Joe D.
>
>
>
> On 21/04/2018 17:54, Erick Erickson wrote:
>>
>> Joe:
>>
>> Serendipity strikes, The thread titled "JVM Heap Memory Increase (SOLR
>> CLOUD)" is a perfect example of why the optimize button is so
>> "fraught".
>>
>> Best,
>> Erick
>>
>> On Sat, Apr 21, 2018 at 9:43 AM, Erick Erickson <erickerick...@gmail.com>
>> wrote:
>>>
>>> Joe:
>>>
>>> Thanks for moving the conversation over here that we were having on
>>> the blog post. I think the wider audience will benefit from this going
>>> forward.
>>>
>>> bq: ...apparent inability to remove piles of deleted docs
>>>
>>> do note that deleted docs are removed during normal indexing when
>>> segments are merged, they're not permanently retained in the index.
>>> Part of the thinking behind SOLR-7733 is exactly that once you press
>>> the very tempting optimize button, you can get into a situation where
>>> your one huge segment does _not_ have the deleted docs removed until
>>> the "live" document space is < 2.5G. Thus if you have a 100G segment
>>> after optimize, it'll look like deleted docs are never removed until
>>> at least 97.5% of the docs are deleted. The default max segment size
>>> is 5G, and the current algorithm doesn't consider segments eligible
>>> for merging until 50% of that maximum number consists of "live" docs.
>>>
>>> The optimize functionality in the admin UI was removed as part of
>>> SOLR-7733 from the screen that comes up when you select a core, but
>>> the "core admin" screen still has the optimize button that comes and
>>> goes depending on whether there are any deleted documents or not. This
>>> page is only visible in standalone mode.
>>>
>>> Unfortunately SOLR-7733 removed the functionality that actually sent
>>> the optimize command from the javascript, so pressing the optimize
>>> button does nothing. This is indeed a bug, see: SOLR-12253 which will
>>> remove the button from the core admin screen in stand-alone mode.
>>>
>>> Optimize (aka forceMerge) is pretty actively discouraged because it is:
>>> 1> very expensive
>>> 2> has significant "gotchas" (we chatted in comments in the blog post
>>> about the gotchas).
>>>
>>> So we made a decision to make it more of an 'expert' option, requiring
>>> users to issue a curl/Browser URL command like
>>> "....solr/core_or_collection/update?optimize=true" if this
>>> functionality is really desirable in their situation. Docs will be
>>> updated too, they're lagging a bit.
>>>
>>> Coming probably in Solr 7.4 is a new parameter (tentatively) for
>>> TieredMergePolicy (TMP) that puts a soft ceiling on the percentage of
>>> deleted docs in an index. The current version of this patch
>>> (LUCENE-7976) sets this threshold at 20% at the expense of about 10%
>>> more I/O in my tests from the current TMP implementation. Under
>>> discussion is how low to allow this to be, we're thinking 10% as a
>>> floor, and what the default should be. The current TMP caps the
>>> percentage deleted docs at close to 50%.
>>>
>>> The thinking behind not allowing the percent deleted documents to be
>>> too low is that that would trigger its own massive I/O issues,
>>> rewriting "live" documents over and over and over. For NRT indexes,
>>> that's almost certainly a horrible tradeoff. For more static indexes,
>>> the "expert" API command is still available.
>>>
>>> Best,
>>> Erick
>>>
>>> On Sat, Apr 21, 2018 at 5:08 AM, Joe Doupnik <j...@netlab1.net> wrote:
>>>>
>>>>      In Solr v7.3.0 the ability to removed "deleted" docs from a core by
>>>> use
>>>> of what until then was the Optmise button on the admin GUI has been
>>>> changed
>>>> in an ungood way. That is, in the V7.3.0 Changes list, item SOLR 7733
>>>> (quote
>>>> remove "optmize from the UI, end quote). The result of that is an
>>>> apparent
>>>> inability to remove piles of deleted docs, which amongst other things
>>>> means
>>>> wasting disk space. That is a marked step backward and is unhelpful for
>>>> use
>>>> of Solr in the field. As other comments in the now closed 7733 ticket
>>>> explain, this is a user item whidh has impact on their site, and it
>>>> ought to
>>>> be an inherent feature of Solr. Consider a file system where complete
>>>> deletes are forbidden, or your kitchen where taking out the rubbish is
>>>> denied. Hand waving about obscure auto-sizing notions will not suffice.
>>>> Thus
>>>> may I urge that the Optimse button and operation be returned to use, as
>>>> it
>>>> was until Solr v7.3.0.
>>>>      Thanks,
>>>>      Joe D.
>
>

Reply via email to