Erick, first up thanks for thoroughly answering my questions. [A] I had read the blot mentioned, and yet failed to 'get it'. Now I understand the flow. [B] The automatic, heuristic based approach as you said will be difficult to get right, that is why I thought 'beefiness' index configuration similar to load balancer might help get same effective result for the most part. I guess it is feature that most people won't need, only the ones in process of upgrading their machines. [C] I will go through the blog, and do empirical analysis. Speaking of caches, I see that for us filter cache hit ratio is good 97% while document cache hit ratio is below 10%, does it mean that document cache (size=4096) is not big enough and I should increase the size or does it mean that we are getting queries that result in too wide a result set and hence we would probably better off switching off the document cache altogether if we could do it.
Thanks, Himanshu On Sun, Jul 6, 2014 at 5:27 AM, Erick Erickson <erickerick...@gmail.com> wrote: > Question1, both sub-cases. > > You're off on the wrong track here, you have to forget about replication. > > When documents are added to the index, they get forwarded to _all_ > replicas. So the flow is like this... > 1> leader gets update request > 2> leader indexes docs locally, and adds to (local) transaction log > _and_ forwards request to all followers > 3> followers add docs to tlog and index locally > 4> followers ack back to leader > 5> leader acks back to client. > > There is no replication in the old sense at all in this scenario. I'll > add parenthetically that old-style replication _is_ still used to > "catch up" a follower that is waaaaaay behind, but the follower is > in the "recovering" state if this ever occurs. > > About commit. If you commit from the client, the commit is forwarded > to all followers (actually, all nodes in the collection). If you have > autocommit configured, each of the replicas will fire their commit when > the time period expires. > > Here's a blog that might help: > > http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ > > [B] right, SolrCloud really supposes that the machines are pretty > similar so doesn't provide any way to do what you're asking. Really, > you're asking for some way to assign "beefiness" to the node in terms > of load sent to it... I don't know of a way to do that and I'm not > sure it's on the roadmap either. > > What you'd really want, though, is some kind of heuristic that was > automatically applied. That would take into account transient load > problems, i.e. replica N happened to get a really nasty query to run > and is just slow for a while. I can see this being very tricky to get > right though. Would a GC pause get weighted as "slow" even though the > pause could be over already? Anyway, I don't think this is on the > roadmap at present but could well be wrong. > > In your specific example, though (this works because of the convenient > 2x....) you could host 2x the number of shards/replicas on the beefier > machines. > > [C] Right, memory allocation is difficult. The general recommendation > is that memory for Solr allocated in the JVM should be as small as > possible, and leave let the op system use memory for MMapDirectory. > See the excellent blog here: > http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html. > If you over-allocate memory to the JVM, your GC profile worsens... > > Generally, when people throw "memory" around they're talking about JVM > memory... > > And don't be mislead by the notion of "the index fitting into memory". > You're absolutely right that when you get into a swapping situation, > performance will suffer. But there are some very interesting tricks > played to keep JVM consumption down. For instance, only every 128th > term is stored in the JVM memory. Other terms are then read as needed. > And stored in the OS memory via MMapDirectory implementations.... > > Your GC stats look quite reasonable. You can get a snapshot of memory > usage by attaching, say, jConsole to the running JVM and see what > memory usage was after a forced GC. Sounds like you've already seen > this, but in case not: > http://searchhub.org/2011/03/27/garbage-collection-bootcamp-1-0/. It > was written before there was much mileage on the new G1 garbage > collector which has received mixed reviews. > > Note that the stored fields kept in memory are controlled by the > documentCache in solrconfig.xml. I think of this as just something > that holds documents when assembling the return list, it really > doesn't have anything to do with searching per-se, just keeping disk > seeks down during processing for a particular query. I.e. for a query > returning 10 rows, only 10 docs will be kept here not the 5M rows that > matched. > > Whether 4G is sufficient is.... not answerable. I've doubled the > memory requirements by changing the query without changing the index. > Here's a blog outlining why we can't predict and how to get an answer > empirically: > > http://searchhub.org/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ > > Best, > Erick > > On Sat, Jul 5, 2014 at 1:57 AM, Himanshu Mehrotra > <himanshumehrotra2...@gmail.com> wrote: > > Hi, > > > > I had three quesions/doubts regarding Solr and SolrCloud functionality. > > Can anyone help clarify these? I know these are bit long, please bear > with > > me. > > > > [A] Replication related - As I understand before SolrCloud, under a > classic > > master/slave replication setup, every 'X' minutes slaves will pull/poll > the > > updated index (index segments added and deleted/merged away ). And when > a > > client explicitly issues a 'commit' only master solr closes/finalizes > > current index segment and creates a new current index segment. As port > of > > this index segment merges as well as 'fsync' ensuring data is on the disk > > also happens. > > > > I read documentation regarding replication on SolrCloud but unfortunately > > it is still not very clear to me. > > > > Say I have solr cloud setup of 3 solr servers with just a single shard. > > Let's call them L (the leader) and F1 and F2, the followers. > > > > Case 1: We are not using autoCommits, and explictly issue 'commit' via > > Client. How does replication happen now? > > Does the each update to leader L that goes into tlog get replicated to > > followers F1, and F2 (wher they also put update in tlog ) before client > > sees response from leader L? What happens when client issues a 'commit'? > > Does the creation of new segment, merging of index segments if required, > > and fsync happen on all three solrs or that just happens on leader L and > > followers F1, F2 simply sync the post commit state of index. More-over > > does leader L wait for fsync in followers F1, F2, before responding > > sucessfully to Client? If yes does it sequentially updates F1 and then > F2 > > or is the process concurrent/parallel via threads. > > > > Case 2: We use autoCommit every 'X' minutes and do not issue 'commit' via > > Client. Is this setup similar to classic master slave in terms of > > data/index updates? > > As in since autoCommit happens every 'X' minutes replication will happen > > after commit, every 'X' minutes followers get updated index. But does > > simple updates, the ones that go int tlog get replicated immediately to > > follower's tlog . > > > > Another thing I noticed in Solr Admin UI, is that replication is set to > > afterCommit, what are other possible settings for this knob. And what > > behaviour do we get out of them. > > > > > > > > > > [B] Load balancing related - In traditional master/slave setup we use > load > > balancer to distribute load search query load equally over slaves. In > case > > one of the slave solr is running on 'beefier' machine (say more RAM or > CPU > > or both) than others, then load balancers allow distributing load by > > weights so that we can distribute load proportional to percieved machine > > capacity. > > > > With solr cloud setup, lets take an example, 2 shards, 3 replicas per > > shard, totaling to 6 solr servers are running and say we have > > Servers S1L1, S1F1, S1F2 hosting replicas of shard1 and servers S2L1, > S2F1, > > S2F2 hosting replicas of shard2. S1L1 and S2L2 happen to be leaders of > > their respective shard. And lets say S1F2, and S2F1 happen to be twice > as > > big machines as others (twice the RAM and CPU). > > > > Ideally speaking in such case we would want S2F1 and S1F2 to handle twice > > the search query load as their peers. That is if 100 search queries come > > we know each shard will receive these 100 queries. So we want S1L1, and > > S1F1 to handle 25 queries each, and S1F2 to handle 50 queries. Similarly > > we would want S2L1 and S2F2 to handle 25 queries and S2F1 to handle 50 > > queries. > > > > As far as I understand, this is not possible via smart client provided in > > SolrJ. All solr servers will handle 33% of the query load. > > > > Alternative is to use dumb client and load balancer over all servers. > But > > even then I guess we won't get correct/desired distribution of queries. > > Say we put following weights for each server > > > > 1 - S1L1 > > 1 - S1F1 > > 2 - S1F2 > > 1 - S1L1 > > 2 - S1F1 > > 1 - S1F2 > > > > Now 1/4 of total number of requests go to S1F2 directly, plus now it > > recieves 1/6 ( 1/2 * 1/3 ) of request that went to some server on shard > 2. > > This totals up to 10/24 of request load, not half as we would expect. > > > > One way could be to chose weight y and x such that y/(2*(y + 2x)) + 1/6 = > > 1/2 . It seems too much of trouble to get them ( y = 4 and x = 1 ). > > Every time we add/remove/upgrade servers we need to recalculate new > weights. > > > > A simpler alternative it appears would be that each solr node register > its > > 'query_weight' with zoo-keeper on joining the cluster. This > 'query_weight' > > could be a property similar to 'solr.solr.home' or 'zkHosts' that we > > specify with startup commandline for solr server. > > > > And all smart clients and solr servers, to honour that weight when they > > distribute load. Is there such a feature planned for Solr Cloud? > > > > > > > > > > [C] GC/Memory usage related - From the documentation and videos available > > on internet, it appears that solr perform well if index fits into the > > memory and stord fields fit in the memory. Holding just index in memory > > has more degrading impact on solr performance and if we don't have enough > > memory to hold index solr is still slower, and the moment java process > hits > > swap space solr will slow to a crawl. > > > > My question is what the 'memory' being talked about is? Is it the Java > Heap > > we specify via Xmx and Xms options. Or is it the free memory, or > buffered, > > or cached memory as available from output free command on *nix systems. > > And how do we know if our index and stored fields will fit the memory. > For > > example say the data directory for the core/collection occupies 200MB on > > disk ( 150,000 live documents and 180,000 max documents per Solr UI) , > then > > is a 8GB machine with solr being configured with Xmx 4G going to be > > sufficient? > > > > Are there any guidlines as to configuring the java heap and total RAM, > > given an index size and the expected query rate ( queries per > > minute/second). > > On production system I observed via gc logs that minor collections happen > > at rate of 20 per minute, full gc happens every seven to ten minutes, are > > these too high or low given direct search query load on that solr node > is > > about 2500 request per minute. What kind of GC behaviour I should expect > > from an healthy and fast/optimal solr node in solr-cloud setup. Is the > > answer it depends on your specific response time and throughput > > requirements or is there some kind of rule of thumb that can followed > > irrespective of the situation. Or should I see if any improvements can > be > > made via regular measure, tweak , measure cycles. > > > > Thanks, > > Himanshu >