Do you use EmbeddedSolr in the query server? There is a memory leak that shows up when taking a lot of replications.
On Wed, Nov 3, 2010 at 8:28 AM, Jonathan Rochkind <rochk...@jhu.edu> wrote: > Ah, but reading Peter's email message I reference more carefully, it seems > that Solr already DOES provide an info-level log warning you about > over-lapping warming, awesome. (But again, I'm pretty sure it does NOT throw > or HTTP error in that condition, based on my and others experience). > > >> To check if your Solr environment is suffering from this, turn on INFO >> level logging, and look for: 'PERFORMANCE WARNING: Overlapping >> onDeckSearchers=x'. > > Sweet, good to know, and I'll definitely add this to my debugging toolbox. > Peter's listserv message really ought to be a wiki page, I think. Any > reason for me not to just add it as a new one with title "Commit frequency > and auto-warming" or something like that? Unless it's already in the wiki > somewhere I haven't found, assuming the wiki will let an ordinary > user-created account add a new page. > // > Jonathan Rochkind wrote: >> >> I hadn't looked at the code, am not familiar with Solr code, and can't say >> what that code does. >> >> But I have experienced issues that I _believe_ were caused by too frequent >> commits causing over-lapping searcher preperation. And I've definitely seen >> Solr documentation that suggests this is an issue. Let me find it now to see >> if the experts think these documented suggests are still correct or not: >> >> "On the other hand, autowarming (populating) a new collection could take a >> lot of time, especially since it uses only one thread and one CPU. If your >> settings fire off snapinstaller too frequently, then a Solr slave could be >> in the undesirable condition of handing-off queries to one (old) collection, >> and, while warming a new collection, a second “new” one could be snapped and >> begin warming! >> >> If we attempted to solve such a situation, we would have to invalidate the >> first “new” collection in order to use the second one, then when a “third” >> new collection would be snapped and warmed, we would have to invalidate the >> “second” new collection, and so on ad infinitum. A completely warmed >> collection would never make it to full term before it was aborted. This can >> be prevented with a properly tuned configuration so new collections do not >> get installed too rapidly. " >> >> >> http://wiki.apache.org/solr/SolrPerformanceFactors#Updates_and_Commit_Frequency_Tradeoffs >> >> I think I've seen that same advice on another wiki page without being >> specifically regarding replication, but just being about commit frequency >> balanced with auto-warming, leading to overlapping warming, leading to >> spiraling RAM/CPU usage -- but NOT an exception being thrown or HTTP error >> delivered. >> >> I can't find it on the wiki, but here's a listserv post with someone >> reporting findings that match my understanding: >> http://osdir.com/ml/solr-user.lucene.apache.org/2010-09/msg00528.html >> >> How does this advice square with the code Lance found? Is my >> understanding of how frequent commits can interact with time it takes to >> warm a new collection correct? Appreciate any additional info. >> >> >> >> >> Lance Norskog wrote: >> >>> >>> Isn't that what this code does? >>> >>> onDeckSearchers++; >>> if (onDeckSearchers < 1) { >>> // should never happen... just a sanity check >>> log.error(logid+"ERROR!!! onDeckSearchers is " + onDeckSearchers); >>> onDeckSearchers=1; // reset >>> } else if (onDeckSearchers > maxWarmingSearchers) { >>> onDeckSearchers--; >>> String msg="Error opening new searcher. exceeded limit of >>> maxWarmingSearchers="+maxWarmingSearchers + ", try again later."; >>> log.warn(logid+""+ msg); >>> // HTTP 503==service unavailable, or 409==Conflict >>> throw new >>> SolrException(SolrException.ErrorCode.SERVICE_UNAVAILABLE,msg,true); >>> } else if (onDeckSearchers > 1) { >>> log.info(logid+"PERFORMANCE WARNING: Overlapping >>> onDeckSearchers=" + onDeckSearchers); >>> } >>> >>> >>> On Tue, Nov 2, 2010 at 10:02 AM, Jonathan Rochkind <rochk...@jhu.edu> >>> wrote: >>> >>>> >>>> It's definitely a known 'issue' that you can't replicate (or do any >>>> other >>>> kind of index change, including a commit) at a faster frequency than >>>> your >>>> warming queries take to complete, or you'll wind up with something like >>>> you've seen. >>>> >>>> It's in some documentation somewhere I saw, for sure. >>>> >>>> The advice to 'just query against the master' is kind of odd, because, >>>> then... why have a slave at all, if you aren't going to query against >>>> it? I >>>> guess just for backup purposes. >>>> >>>> But even with just one solr, or querying master, if you commit at rate >>>> such >>>> that commits come before the warming queries can complete, you're going >>>> to >>>> have the same issue. >>>> >>>> The only answer I know of is "Don't commit (or replicate) at a faster >>>> rate >>>> than it takes your warming to complete." You can reduce your warming >>>> queries/operations, or reduce your commit/replicate frequency. >>>> >>>> Would be interesting/useful if Solr noticed this going on, and gave you >>>> some >>>> kind of error in the log (or even an exception when started with a >>>> certain >>>> parameter for testing) "Overlapping warming queries, you're committing >>>> too >>>> fast" or something. Because it's easy to make this happen without >>>> realizing >>>> it, and then your Solr does what Simon says, runs out of RAM and/or uses >>>> a >>>> whole lot of CPU and disk io. >>>> >>>> Lance Norskog wrote: >>>> >>>>> >>>>> You should query against the indexer. I'm impressed that you got 5s >>>>> replication to work reliably. >>>>> >>>>> On Mon, Nov 1, 2010 at 4:27 PM, Simon Wistow <si...@thegestalt.org> >>>>> wrote: >>>>> >>>>> >>>>>> >>>>>> We've been trying to get a setup in which a slave replicates from a >>>>>> master every few seconds (ideally every second but currently we have >>>>>> it >>>>>> set at every 5s). >>>>>> >>>>>> Everything seems to work fine until, periodically, the slave just >>>>>> stops >>>>>> responding from what looks like it running out of memory: >>>>>> >>>>>> org.apache.catalina.core.StandardWrapperValve invoke >>>>>> SEVERE: Servlet.service() for servlet jsp threw exception >>>>>> java.lang.OutOfMemoryError: Java heap space >>>>>> >>>>>> >>>>>> (our monitoring seems to confirm this). >>>>>> >>>>>> Looking around my suspicion is that it takes new Readers longer to >>>>>> warm >>>>>> than the gap between replication and thus they just build up until all >>>>>> memory is consumed (which, I suppose isn't really memory 'leaking' per >>>>>> se, more just resource consumption) >>>>>> >>>>>> That said, we've tried turning off caching on the slave and that >>>>>> didn't >>>>>> help either so it's possible I'm wrong. >>>>>> >>>>>> Is there anything we can do about this? I'm reluctant to increase the >>>>>> heap space since I suspect that will mean that there's just a longer >>>>>> period between failures. Might Zoie help here? Or should we just query >>>>>> against the Master? >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Simon >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>> >>> > -- Lance Norskog goks...@gmail.com