Re: Solr Index size keeps fluctuating, becomes ~4x normal size.

Shawn Heisey Tue, 18 Apr 2017 06:29:14 -0700

On 4/10/2017 1:57 AM, Himanshu Sachdeva wrote:
> Thanks for your time and quick response. As you said, I changed our
> logging level from SEVERE to INFO and indeed found the performance
> warning *Overlapping onDeckSearchers=2* in the logs. I am considering
> limiting the *maxWarmingSearchers* count in configuration but want to
> be sure that nothing breaks in production in case simultaneous commits
> do happen afterwards.


Don't do commits from multiple sources.  A good general practice with
Solr is to either use autoSoftCommit or add a commitWithin parameter to
each indexing request, so commits are fully automated and can't
overlap.  Make the interval on whichever method you use as large as you
can.  I would personally use 60000 (one minute) as a bare minimum, and
would prefer a larger number.

A soft commit takes less time/resources than a hard commit that opens a
searcher, but they are NOT even close to "free".  Opening the searcher
(which all soft commits do) is the expensive part, not the commit itself.

Regardless of what else you do, you should have autoCommit configured
with openSearcher set to false.  I would personally use a maxTime of
60000 (one minute) or 120000 (two minutes) for autoCommit. 
Recommendations and example configs will commonly have this set to 15
seconds.  That value works well, and does not usually cause problems,
but I like to put less of a load on the server, so I use a larger interval.

See this blog post for a detailed discussion:

https://lucidworks.com/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

> What would happen if we set *maxWarmingSearchers* count to 1 and make
> simultaneous commit from different endpoints? I understand that solr
> will prevent opening a new searcher for the second commit but is that
> all there is to it? Does it mean solr will serve stale data( i.e. send
> stale data to the slaves) ignoring the changes from the second commit?
> Will these changes reflect only when a new searcher is initialized and
> will they be ignored till then? Do we even need searchers on the
> master as we will be querying only the slaves? What purpose do the
> searchers serve exactly? Your time and guidance will be very much
> appreciated. Thank you. 

If the maxWarmingSearchers value prevents a commit from opening a
searcher, then changes between the previous commit and that commit will
not be visible *on the master* until a later commit happens and IS able
to open a new searcher.  What happens on the slaves may be a little bit
different, because commits normally only happen on the slave when a
changed index is replicated from the master.

The usual historical number for maxWarmingSearchers in example configs
on older versions is 2, while the intrinsic default is no limit
(Integer.MAX_VALUE).  Starting with 6.4.0, the intrinsic default has
been changed to 1, and the configuration has been removed from the
example configs.  Increasing it is almost always the wrong thing to do,
which is why the default has been lowered to 1.

https://issues.apache.org/jira/browse/SOLR-9712

https://wiki.apache.org/solr/SolrPerformanceProblems#Slow_commits
https://wiki.apache.org/solr/FAQ#What_does_.22exceeded_limit_of_maxWarmingSearchers.3DX.22_mean.3F

On the master, you should set up automatic commits as I described above
and do not make explicit commit requests from update clients.  On the
slaves, autoCommit should be set up just like the master, but the other
automatic settings aren't typically necessary.  On slaves, as already
mentioned, commits only happen when the index is replicated from the
master -- you generally don't need to worry about any special
commit-related configuration, aside from making sure that the
autowarmCount value on the caches is not too high.  Masters that do not
receive queries can have autowarmCount set to zero, which can improve
commit speed by making the searcher open faster.

To fix problems with exceeding the warming searcher limit, you must
reduce the commit frequency or make commits happen faster.

Side issue:  If you don't want the verbosity of INFO logging, which is
really noisy, set it to WARN.  A properly configured Solr server that is
not having problems should not log ANYTHING when the severity is WARN. 
If the configuration is not optimal, you may see some WARN messages. 
Setting the level to SEVERE is extremely restrictive, and will prevent
you from seeing informative error messages when problems happen.

Recent Solr versions do have a tendency to log information like this
repeatedly, followed by a stacktrace:

2017-04-14 19:40:00.207 WARN  (qtp895947612-598) [   x:spark2live]
o.a.s.h.a.LukeRequestHandler Error getting file length for [segments_o0e]
java.nio.file.NoSuchFileException:
/index/solr6/data/data/spark2_0/index/segments_o0e

We have an issue filed for this message, but it hasn't yet been fixed. 
It does not seem to cause actual problems, just an annoying log
message.  Until the reason for this error is found and the problem is
fixed, the message can be eliminated from the logs without omitting
other problems by changing the level on
org.apache.solr.handler.admin.LukeRequestHandler to ERROR.  This can
either be done in the logging UI, or if you don't want to do it manually
after every restart, in log4.properties.

https://issues.apache.org/jira/browse/SOLR-9120

Thanks,
Shawn

Re: Solr Index size keeps fluctuating, becomes ~4x normal size.

Reply via email to