Ah, but reading Peter's email message I reference more carefully, it
seems that Solr already DOES provide an info-level log warning you about
over-lapping warming, awesome. (But again, I'm pretty sure it does NOT
throw or HTTP error in that condition, based on my and others experience).
> To check if your Solr environment is suffering from this, turn on INFO
> level logging, and look for: 'PERFORMANCE WARNING: Overlapping
> onDeckSearchers=x'.
Sweet, good to know, and I'll definitely add this to my debugging
toolbox. Peter's listserv message really ought to be a wiki page, I
think. Any reason for me not to just add it as a new one with title
"Commit frequency and auto-warming" or something like that? Unless it's
already in the wiki somewhere I haven't found, assuming the wiki will
let an ordinary user-created account add a new page.
//
Jonathan Rochkind wrote:
I hadn't looked at the code, am not familiar with Solr code, and can't
say what that code does.
But I have experienced issues that I _believe_ were caused by too
frequent commits causing over-lapping searcher preperation. And I've
definitely seen Solr documentation that suggests this is an issue. Let
me find it now to see if the experts think these documented suggests are
still correct or not:
"On the other hand, autowarming (populating) a new collection could take
a lot of time, especially since it uses only one thread and one CPU. If
your settings fire off snapinstaller too frequently, then a Solr slave
could be in the undesirable condition of handing-off queries to one
(old) collection, and, while warming a new collection, a second “new”
one could be snapped and begin warming!
If we attempted to solve such a situation, we would have to invalidate
the first “new” collection in order to use the second one, then when a
“third” new collection would be snapped and warmed, we would have to
invalidate the “second” new collection, and so on ad infinitum. A
completely warmed collection would never make it to full term before it
was aborted. This can be prevented with a properly tuned configuration
so new collections do not get installed too rapidly. "
http://wiki.apache.org/solr/SolrPerformanceFactors#Updates_and_Commit_Frequency_Tradeoffs
I think I've seen that same advice on another wiki page without being
specifically regarding replication, but just being about commit
frequency balanced with auto-warming, leading to overlapping warming,
leading to spiraling RAM/CPU usage -- but NOT an exception being thrown
or HTTP error delivered.
I can't find it on the wiki, but here's a listserv post with someone
reporting findings that match my understanding:
http://osdir.com/ml/solr-user.lucene.apache.org/2010-09/msg00528.html
How does this advice square with the code Lance found? Is my
understanding of how frequent commits can interact with time it takes to
warm a new collection correct? Appreciate any additional info.
Lance Norskog wrote:
Isn't that what this code does?
onDeckSearchers++;
if (onDeckSearchers < 1) {
// should never happen... just a sanity check
log.error(logid+"ERROR!!! onDeckSearchers is " + onDeckSearchers);
onDeckSearchers=1; // reset
} else if (onDeckSearchers > maxWarmingSearchers) {
onDeckSearchers--;
String msg="Error opening new searcher. exceeded limit of
maxWarmingSearchers="+maxWarmingSearchers + ", try again later.";
log.warn(logid+""+ msg);
// HTTP 503==service unavailable, or 409==Conflict
throw new
SolrException(SolrException.ErrorCode.SERVICE_UNAVAILABLE,msg,true);
} else if (onDeckSearchers > 1) {
log.info(logid+"PERFORMANCE WARNING: Overlapping
onDeckSearchers=" + onDeckSearchers);
}
On Tue, Nov 2, 2010 at 10:02 AM, Jonathan Rochkind <rochk...@jhu.edu> wrote:
It's definitely a known 'issue' that you can't replicate (or do any other
kind of index change, including a commit) at a faster frequency than your
warming queries take to complete, or you'll wind up with something like
you've seen.
It's in some documentation somewhere I saw, for sure.
The advice to 'just query against the master' is kind of odd, because,
then... why have a slave at all, if you aren't going to query against it? I
guess just for backup purposes.
But even with just one solr, or querying master, if you commit at rate such
that commits come before the warming queries can complete, you're going to
have the same issue.
The only answer I know of is "Don't commit (or replicate) at a faster rate
than it takes your warming to complete." You can reduce your warming
queries/operations, or reduce your commit/replicate frequency.
Would be interesting/useful if Solr noticed this going on, and gave you some
kind of error in the log (or even an exception when started with a certain
parameter for testing) "Overlapping warming queries, you're committing too
fast" or something. Because it's easy to make this happen without realizing
it, and then your Solr does what Simon says, runs out of RAM and/or uses a
whole lot of CPU and disk io.
Lance Norskog wrote:
You should query against the indexer. I'm impressed that you got 5s
replication to work reliably.
On Mon, Nov 1, 2010 at 4:27 PM, Simon Wistow <si...@thegestalt.org> wrote:
We've been trying to get a setup in which a slave replicates from a
master every few seconds (ideally every second but currently we have it
set at every 5s).
Everything seems to work fine until, periodically, the slave just stops
responding from what looks like it running out of memory:
org.apache.catalina.core.StandardWrapperValve invoke
SEVERE: Servlet.service() for servlet jsp threw exception
java.lang.OutOfMemoryError: Java heap space
(our monitoring seems to confirm this).
Looking around my suspicion is that it takes new Readers longer to warm
than the gap between replication and thus they just build up until all
memory is consumed (which, I suppose isn't really memory 'leaking' per
se, more just resource consumption)
That said, we've tried turning off caching on the slave and that didn't
help either so it's possible I'm wrong.
Is there anything we can do about this? I'm reluctant to increase the
heap space since I suspect that will mean that there's just a longer
period between failures. Might Zoie help here? Or should we just query
against the Master?
Thanks,
Simon