Emir: OK, thanks for pointing that out, that relieves me a lot!
Erick On Mon, Sep 25, 2017 at 1:03 AM, Emir Arnautović <emir.arnauto...@sematext.com> wrote: > Hi Eric, > I don’t think that there are some bugs with searcher reopening - this is a > scenario with a new slave: > > “But when I add a *new* slave pointing to the master…” > > So expected to have zero results until replication finishes. > > Regards, > Emir > >> On 23 Sep 2017, at 19:21, Erick Erickson <erickerick...@gmail.com> wrote: >> >> First I'd like to say that I wish more people would take the time like >> you have to fully describe the problem and your observations, it makes >> it soooo much nicer than having half-a-dozen back and forths! Thanks! >> >> Just so it doesn't get buried in the rest of the response, I do tend >> to go on.... I suspect you have a suggester configured. The >> index-based suggesters read through your _entire_ index, all the >> stored fields from all the documents and process them into an FST or >> "sidecar" index. See: >> https://lucidworks.com/2015/03/04/solr-suggester/. If this is true >> they might be being built on the slaves whenever a replication >> happens. Hmmm, if this is true, let us know. You can tell by removing >> the suggester from the config and timing again. It seems like in the >> master/slave config we should copy these down but don't know if it's >> been tested. >> >> If they are being built on the slaves, you might try commenting out >> all of the buildOn.... bits on the slave configurations. Frankly I >> don't know if building the suggester structures on the master would >> propagate them to the slave correctly if the slave doesn't build them, >> but it would certainly be a fat clue if it changed the load time on >> the slaves and we could look some more at options. >> >> Observation 1: Allocating 40G of memory for an index only 12G seems >> like overkill. This isn't the root of your problem, but a 12G index >> shouldn't need near 40G of JVM. In fact, due to MMapDirectory being >> used (see Uwe Schindler's blog here: >> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html) >> I'd guess you can get away with MUCH less memory, maybe as low as 8G >> or so. The wildcard here would be the size of your caches, especially >> your filterCache configured in solrconfig.xml. Like I mentioned, this >> isn't the root of your replication issue, just sayin'. >> >> Observation 2: Hard commits (the <autocommit> setting is not a very >> expensive operation with openSearcher=false. Again this isn't the root >> of your problem but consider removing the number of docs limitation >> and just making it time-based, say every minute. Long blog on the >> topic here: >> https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/. >> You might be accumulating pretty large transaction logs (assuming you >> haven't disabled them) to no good purpose. Given your observation that >> the actual transmission of the index takes 2 minutes, this is probably >> not something to worry about much, but is worth checking. >> >> Question 1: >> >> Solr should be doing nothing other than opening a new searcher, which >> should be roughly the "autowarm" time on master plus (perhaps) >> suggester build. Your observation that autowarming takes quite a bit >> of time (evidenced by much shorter times when you set the counts to >> zero) is a smoking gun that you're probably doing far too much >> autowarming. HOWEVER, during this interval the replica should be >> serving queries from the old searcher so something else is going on >> here. Autowarming is actually pretty simple, perhaps this will help >> you to keep in mind while tuning: >> >> The queryResultCache and filterCache are essentially maps where the >> key is just the text of the clause (simplifying here). So for the >> queryResultCache the key is the entire search request. For the >> filterCache, the key is just the "fq" clause. autowarm count in each >> just means the number of keys that are replayed when a new searcher is >> opened. I usually start with a pretty small number, on the order of >> 10-20. The purpose of them is just to keep from experiencing a delay >> when the first few searches are performed after a searcher is opened. >> >> My bet: you won't notice a measurable difference when dropping the >> atuowarm counts drastically in terms of query response, but you will >> save the startup time. I also suspect you can reduce the size of the >> caches drastically, but don't know what you have them set to, it's a >> guess. >> >> As to what's happening such that you serve queries with zero counts, >> my best guess at this point is that you are rebuilding >> autosuggesters..... We shouldn't be serving queries from the new >> searcher during this interval, if confirmed we need to raise a JIRA. >> >> Question 2: see above, autosuggester? >> >> Question 3a: documents should become searchable on the slave when 1> >> all the segments are copied, 2> autowarm is completed. As above, the >> fact that you get 0-hit responses isn't what _should_ be happening. >> >> Autocommit settings are pretty irrelevant on the slave. >> >> Question 3b: soft commit on the master shouldn't affect the slave at all. >> >> The fact that you have 500 fields shouldn't matter that much in this >> scenario. Again, the fact that removing your autowarm settings makes >> such a difference indicates the counts are excessive, and I have a >> secondary assumption that you probably have your cache settings far >> higher than you need, but you'll have to test if you try to reduce >> them.... BTW, I often find the 512 default setting more than ample, >> monitor via admin UI>>core>>plugins/stats to see the hit ratio... >> >> As I told you, I do go on.... >> >> Best, >> Erick >> >> On Sat, Sep 23, 2017 at 6:40 AM, yasoobhaider <yasoobhaid...@gmail.com> >> wrote: >>> Hi >>> >>> We have setup a master-slave architecture for our Solr instance. >>> >>> Number of docs: 2 million >>> Collection size: ~12GB when optimized >>> Heap size: 40G >>> Machine specs: 60G, 8 cores >>> >>> We are using Solr 6.2.1. >>> >>> Autocommit Configuration: >>> >>> <autoCommit> >>> <maxDocs>40000</maxDocs> >>> <maxTime>900000</maxTime> >>> <openSearcher>false</openSearcher> >>> </autoCommit> >>> >>> <autoSoftCommit> >>> <maxTime>${solr.autoSoftCommit.maxTime:3600000}</maxTime> >>> </autoSoftCommit> >>> >>> I have setup the maxDocs at 40k because we do a heavy weekly indexing, and I >>> didn't want a lot of commits happening too fast. >>> >>> Indexing runs smoothly on master. But when I add a new slave pointing to the >>> master, it takes about 20 minutes for the slave to become queryable. >>> >>> There are two parts to this latency. First, it takes approximately 13 >>> minutes for the generation of the slave to be same as master. Then it takes >>> another 7 minutes for the instance to become queryable (it returns 0 hits in >>> these 7 minutes). >>> >>> I checked the logs and the collection is downloaded within two minutes. >>> After that, there is nothing in the logs for next few minutes, even with >>> LoggingInfoSteam set to 'ALL'. >>> >>> Question 1. What happens after all the files have been downloaded on slave >>> from master? What is Solr doing internally that the generation sync up with >>> master takes so long? Whatever it is doing, should it take that long? (~5 >>> minutes). >>> >>> After the generation sync up happens, it takes another 7 minutes to start >>> giving results. I set the autowarm count in all caches to 0, which brought >>> it down to 3 minutes. >>> >>> Question 2. What is happening here in the 3 minutes? Can this also be >>> optimized? >>> >>> And I wanted to ask another unrelated question regarding when a slave become >>> searchable. I understand that documents on master become searchable if a >>> hard commit happens with openSearcher set to true, or when a soft commit >>> happens. But when do documents become searchable on a slave? >>> >>> Question 3a. When do documents become searchable on a slave? As soon as a >>> segment is copied over from master? Does softcommit make any sense on a >>> slave, as we are not indexing anything? Does autocommit with opensearcher >>> true affect slave in any way? >>> >>> Question 3b. Does a softcommit on master affect slave in any way? (I only >>> have commit and startup options in my replicateAfter field in solrconfig) >>> >>> Would appreciate any help. >>> >>> PS: One of my colleague said that the latency may be because our schema.xml >>> is huge (~500 fields). Question 4. Could that be a reason? >>> >>> Thanks >>> Yasoob Haider >>> >>> >>> >>> -- >>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >