First I'd like to say that I wish more people would take the time like you have to fully describe the problem and your observations, it makes it soooo much nicer than having half-a-dozen back and forths! Thanks!
Just so it doesn't get buried in the rest of the response, I do tend to go on.... I suspect you have a suggester configured. The index-based suggesters read through your _entire_ index, all the stored fields from all the documents and process them into an FST or "sidecar" index. See: https://lucidworks.com/2015/03/04/solr-suggester/. If this is true they might be being built on the slaves whenever a replication happens. Hmmm, if this is true, let us know. You can tell by removing the suggester from the config and timing again. It seems like in the master/slave config we should copy these down but don't know if it's been tested. If they are being built on the slaves, you might try commenting out all of the buildOn.... bits on the slave configurations. Frankly I don't know if building the suggester structures on the master would propagate them to the slave correctly if the slave doesn't build them, but it would certainly be a fat clue if it changed the load time on the slaves and we could look some more at options. Observation 1: Allocating 40G of memory for an index only 12G seems like overkill. This isn't the root of your problem, but a 12G index shouldn't need near 40G of JVM. In fact, due to MMapDirectory being used (see Uwe Schindler's blog here: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html) I'd guess you can get away with MUCH less memory, maybe as low as 8G or so. The wildcard here would be the size of your caches, especially your filterCache configured in solrconfig.xml. Like I mentioned, this isn't the root of your replication issue, just sayin'. Observation 2: Hard commits (the <autocommit> setting is not a very expensive operation with openSearcher=false. Again this isn't the root of your problem but consider removing the number of docs limitation and just making it time-based, say every minute. Long blog on the topic here: https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/. You might be accumulating pretty large transaction logs (assuming you haven't disabled them) to no good purpose. Given your observation that the actual transmission of the index takes 2 minutes, this is probably not something to worry about much, but is worth checking. Question 1: Solr should be doing nothing other than opening a new searcher, which should be roughly the "autowarm" time on master plus (perhaps) suggester build. Your observation that autowarming takes quite a bit of time (evidenced by much shorter times when you set the counts to zero) is a smoking gun that you're probably doing far too much autowarming. HOWEVER, during this interval the replica should be serving queries from the old searcher so something else is going on here. Autowarming is actually pretty simple, perhaps this will help you to keep in mind while tuning: The queryResultCache and filterCache are essentially maps where the key is just the text of the clause (simplifying here). So for the queryResultCache the key is the entire search request. For the filterCache, the key is just the "fq" clause. autowarm count in each just means the number of keys that are replayed when a new searcher is opened. I usually start with a pretty small number, on the order of 10-20. The purpose of them is just to keep from experiencing a delay when the first few searches are performed after a searcher is opened. My bet: you won't notice a measurable difference when dropping the atuowarm counts drastically in terms of query response, but you will save the startup time. I also suspect you can reduce the size of the caches drastically, but don't know what you have them set to, it's a guess. As to what's happening such that you serve queries with zero counts, my best guess at this point is that you are rebuilding autosuggesters..... We shouldn't be serving queries from the new searcher during this interval, if confirmed we need to raise a JIRA. Question 2: see above, autosuggester? Question 3a: documents should become searchable on the slave when 1> all the segments are copied, 2> autowarm is completed. As above, the fact that you get 0-hit responses isn't what _should_ be happening. Autocommit settings are pretty irrelevant on the slave. Question 3b: soft commit on the master shouldn't affect the slave at all. The fact that you have 500 fields shouldn't matter that much in this scenario. Again, the fact that removing your autowarm settings makes such a difference indicates the counts are excessive, and I have a secondary assumption that you probably have your cache settings far higher than you need, but you'll have to test if you try to reduce them.... BTW, I often find the 512 default setting more than ample, monitor via admin UI>>core>>plugins/stats to see the hit ratio... As I told you, I do go on.... Best, Erick On Sat, Sep 23, 2017 at 6:40 AM, yasoobhaider <yasoobhaid...@gmail.com> wrote: > Hi > > We have setup a master-slave architecture for our Solr instance. > > Number of docs: 2 million > Collection size: ~12GB when optimized > Heap size: 40G > Machine specs: 60G, 8 cores > > We are using Solr 6.2.1. > > Autocommit Configuration: > > <autoCommit> > <maxDocs>40000</maxDocs> > <maxTime>900000</maxTime> > <openSearcher>false</openSearcher> > </autoCommit> > > <autoSoftCommit> > <maxTime>${solr.autoSoftCommit.maxTime:3600000}</maxTime> > </autoSoftCommit> > > I have setup the maxDocs at 40k because we do a heavy weekly indexing, and I > didn't want a lot of commits happening too fast. > > Indexing runs smoothly on master. But when I add a new slave pointing to the > master, it takes about 20 minutes for the slave to become queryable. > > There are two parts to this latency. First, it takes approximately 13 > minutes for the generation of the slave to be same as master. Then it takes > another 7 minutes for the instance to become queryable (it returns 0 hits in > these 7 minutes). > > I checked the logs and the collection is downloaded within two minutes. > After that, there is nothing in the logs for next few minutes, even with > LoggingInfoSteam set to 'ALL'. > > Question 1. What happens after all the files have been downloaded on slave > from master? What is Solr doing internally that the generation sync up with > master takes so long? Whatever it is doing, should it take that long? (~5 > minutes). > > After the generation sync up happens, it takes another 7 minutes to start > giving results. I set the autowarm count in all caches to 0, which brought > it down to 3 minutes. > > Question 2. What is happening here in the 3 minutes? Can this also be > optimized? > > And I wanted to ask another unrelated question regarding when a slave become > searchable. I understand that documents on master become searchable if a > hard commit happens with openSearcher set to true, or when a soft commit > happens. But when do documents become searchable on a slave? > > Question 3a. When do documents become searchable on a slave? As soon as a > segment is copied over from master? Does softcommit make any sense on a > slave, as we are not indexing anything? Does autocommit with opensearcher > true affect slave in any way? > > Question 3b. Does a softcommit on master affect slave in any way? (I only > have commit and startup options in my replicateAfter field in solrconfig) > > Would appreciate any help. > > PS: One of my colleague said that the latency may be because our schema.xml > is huge (~500 fields). Question 4. Could that be a reason? > > Thanks > Yasoob Haider > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html