Hi Eric, I don’t think that there are some bugs with searcher reopening - this is a scenario with a new slave:
“But when I add a *new* slave pointing to the master…” So expected to have zero results until replication finishes. Regards, Emir > On 23 Sep 2017, at 19:21, Erick Erickson <erickerick...@gmail.com> wrote: > > First I'd like to say that I wish more people would take the time like > you have to fully describe the problem and your observations, it makes > it soooo much nicer than having half-a-dozen back and forths! Thanks! > > Just so it doesn't get buried in the rest of the response, I do tend > to go on.... I suspect you have a suggester configured. The > index-based suggesters read through your _entire_ index, all the > stored fields from all the documents and process them into an FST or > "sidecar" index. See: > https://lucidworks.com/2015/03/04/solr-suggester/. If this is true > they might be being built on the slaves whenever a replication > happens. Hmmm, if this is true, let us know. You can tell by removing > the suggester from the config and timing again. It seems like in the > master/slave config we should copy these down but don't know if it's > been tested. > > If they are being built on the slaves, you might try commenting out > all of the buildOn.... bits on the slave configurations. Frankly I > don't know if building the suggester structures on the master would > propagate them to the slave correctly if the slave doesn't build them, > but it would certainly be a fat clue if it changed the load time on > the slaves and we could look some more at options. > > Observation 1: Allocating 40G of memory for an index only 12G seems > like overkill. This isn't the root of your problem, but a 12G index > shouldn't need near 40G of JVM. In fact, due to MMapDirectory being > used (see Uwe Schindler's blog here: > http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html) > I'd guess you can get away with MUCH less memory, maybe as low as 8G > or so. The wildcard here would be the size of your caches, especially > your filterCache configured in solrconfig.xml. Like I mentioned, this > isn't the root of your replication issue, just sayin'. > > Observation 2: Hard commits (the <autocommit> setting is not a very > expensive operation with openSearcher=false. Again this isn't the root > of your problem but consider removing the number of docs limitation > and just making it time-based, say every minute. Long blog on the > topic here: > https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/. > You might be accumulating pretty large transaction logs (assuming you > haven't disabled them) to no good purpose. Given your observation that > the actual transmission of the index takes 2 minutes, this is probably > not something to worry about much, but is worth checking. > > Question 1: > > Solr should be doing nothing other than opening a new searcher, which > should be roughly the "autowarm" time on master plus (perhaps) > suggester build. Your observation that autowarming takes quite a bit > of time (evidenced by much shorter times when you set the counts to > zero) is a smoking gun that you're probably doing far too much > autowarming. HOWEVER, during this interval the replica should be > serving queries from the old searcher so something else is going on > here. Autowarming is actually pretty simple, perhaps this will help > you to keep in mind while tuning: > > The queryResultCache and filterCache are essentially maps where the > key is just the text of the clause (simplifying here). So for the > queryResultCache the key is the entire search request. For the > filterCache, the key is just the "fq" clause. autowarm count in each > just means the number of keys that are replayed when a new searcher is > opened. I usually start with a pretty small number, on the order of > 10-20. The purpose of them is just to keep from experiencing a delay > when the first few searches are performed after a searcher is opened. > > My bet: you won't notice a measurable difference when dropping the > atuowarm counts drastically in terms of query response, but you will > save the startup time. I also suspect you can reduce the size of the > caches drastically, but don't know what you have them set to, it's a > guess. > > As to what's happening such that you serve queries with zero counts, > my best guess at this point is that you are rebuilding > autosuggesters..... We shouldn't be serving queries from the new > searcher during this interval, if confirmed we need to raise a JIRA. > > Question 2: see above, autosuggester? > > Question 3a: documents should become searchable on the slave when 1> > all the segments are copied, 2> autowarm is completed. As above, the > fact that you get 0-hit responses isn't what _should_ be happening. > > Autocommit settings are pretty irrelevant on the slave. > > Question 3b: soft commit on the master shouldn't affect the slave at all. > > The fact that you have 500 fields shouldn't matter that much in this > scenario. Again, the fact that removing your autowarm settings makes > such a difference indicates the counts are excessive, and I have a > secondary assumption that you probably have your cache settings far > higher than you need, but you'll have to test if you try to reduce > them.... BTW, I often find the 512 default setting more than ample, > monitor via admin UI>>core>>plugins/stats to see the hit ratio... > > As I told you, I do go on.... > > Best, > Erick > > On Sat, Sep 23, 2017 at 6:40 AM, yasoobhaider <yasoobhaid...@gmail.com> wrote: >> Hi >> >> We have setup a master-slave architecture for our Solr instance. >> >> Number of docs: 2 million >> Collection size: ~12GB when optimized >> Heap size: 40G >> Machine specs: 60G, 8 cores >> >> We are using Solr 6.2.1. >> >> Autocommit Configuration: >> >> <autoCommit> >> <maxDocs>40000</maxDocs> >> <maxTime>900000</maxTime> >> <openSearcher>false</openSearcher> >> </autoCommit> >> >> <autoSoftCommit> >> <maxTime>${solr.autoSoftCommit.maxTime:3600000}</maxTime> >> </autoSoftCommit> >> >> I have setup the maxDocs at 40k because we do a heavy weekly indexing, and I >> didn't want a lot of commits happening too fast. >> >> Indexing runs smoothly on master. But when I add a new slave pointing to the >> master, it takes about 20 minutes for the slave to become queryable. >> >> There are two parts to this latency. First, it takes approximately 13 >> minutes for the generation of the slave to be same as master. Then it takes >> another 7 minutes for the instance to become queryable (it returns 0 hits in >> these 7 minutes). >> >> I checked the logs and the collection is downloaded within two minutes. >> After that, there is nothing in the logs for next few minutes, even with >> LoggingInfoSteam set to 'ALL'. >> >> Question 1. What happens after all the files have been downloaded on slave >> from master? What is Solr doing internally that the generation sync up with >> master takes so long? Whatever it is doing, should it take that long? (~5 >> minutes). >> >> After the generation sync up happens, it takes another 7 minutes to start >> giving results. I set the autowarm count in all caches to 0, which brought >> it down to 3 minutes. >> >> Question 2. What is happening here in the 3 minutes? Can this also be >> optimized? >> >> And I wanted to ask another unrelated question regarding when a slave become >> searchable. I understand that documents on master become searchable if a >> hard commit happens with openSearcher set to true, or when a soft commit >> happens. But when do documents become searchable on a slave? >> >> Question 3a. When do documents become searchable on a slave? As soon as a >> segment is copied over from master? Does softcommit make any sense on a >> slave, as we are not indexing anything? Does autocommit with opensearcher >> true affect slave in any way? >> >> Question 3b. Does a softcommit on master affect slave in any way? (I only >> have commit and startup options in my replicateAfter field in solrconfig) >> >> Would appreciate any help. >> >> PS: One of my colleague said that the latency may be because our schema.xml >> is huge (~500 fields). Question 4. Could that be a reason? >> >> Thanks >> Yasoob Haider >> >> >> >> -- >> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html