Thanks Ere. I've taken a look at the discussion here: http://lucene.472066.n3.nabble.com/Limit-search-queries-only-to-pull-replicas-td4367323.html This is how I was imagining TLOG & PULL replicas would wor, so if this functionality does get developed, it would be useful to me.
I still have 2 questions at the moment: 1. I am running the single shard scenario. I'm thinking of using a dedicated HTTP load-balancer in front of the PULL replicas only with read-only queries directed directly at the load-balancer. In this situation, the healthy PULL replicas *should* handle the queries on the node itself without a proxy hop (assuming state=active). New PULL replicas added to the load-balancer will internally proxy queries to the other PULL or TLOG replicas while in state=recovering until the switch to state=active. Is my understanding correct? 2. Is it all worth it? Is there any advantage to running a cluster of 3 TLOGs + 10 PULL replicas vs running 13 TLOG replicas? On 12 February 2018 at 19:25, Ere Maijala <ere.maij...@helsinki.fi> wrote: > Your question about directing queries to PULL replicas only has been > discussed on the list. Look for topic "Limit search queries only to pull > replicas". What I'd like to see is something similar to the > preferLocalShards parameter. It could be something like > "preferReplicaTypes=TLOG,PULL". Tomás mentioned previously that > SOLR-10880 could be used as a base for such funtionality, and I'm > considering taking a stab at implementing it. > > --Ere > > > Greg Roodt kirjoitti 12.2.2018 klo 6.55: > >> Thank you both for your very detailed answers. >> >> This is great to know. I knew that SolrJ had the cluster aware knowledge >> (via zookeeper), but I was wondering what something like curl would do. >> Great to know that internally the cluster will proxy queries to the >> appropriate place regardless. >> >> I am running the single shard scenario. I'm thinking of using a dedicated >> HTTP load-balancer in front of the PULL replicas only with read-only >> queries directed directly at the load-balancer. In this situation, the >> healthy PULL replicas *should* handle the queries on the node itself >> without a proxy hop (assuming state=active). New PULL replicas added to >> the >> load-balancer will internally proxy queries to the other PULL or TLOG >> replicas while in state=recovering until the switch to state=active. >> >> Is my understanding correct? >> >> Is this sensible to do, or is it not worth it due to the smart proxying >> that SolrCloud can do anyway? >> >> If the TLOG and PULL replicas are so similar, is there any real advantage >> to having a mixed cluster? I assume a bit less work is required across the >> cluster to propagate writes if you only have 3 TLOG nodes vs 10+ PULL >> nodes? Or would it be better to just have 13 TLOG nodes? >> >> >> >> >> >> On 12 February 2018 at 15:24, Tomas Fernandez Lobbe <tflo...@apple.com> >> wrote: >> >> On the last question: >>> For Writes: Yes. Writes are going to be sent to the shard leader, and >>> since PULL replicas can’t be leaders, it’s going to be a TLOG replica. >>> If >>> you are using CloudSolrClient, then this routing will be done directly >>> from >>> the client (since it will send the update to the leader), and if you are >>> using some other HTTP client, then yes, the PULL replica will forward the >>> update, the same way any non-leader node would. >>> >>> For reads: this won’t happen today, and any replica can respond to >>> queries. I do believe there is value in this kind of routing logic, >>> sometimes you simply don’t want the leader to handle any queries, >>> specially >>> when queries can be expensive. You could do this today if you want, by >>> putting some load balancer in front and just direct your queries to the >>> nodes you know are PULL, but keep in mind that this would only work in >>> the >>> single shard scenario, and only if you hit an active replica (otherwise, >>> as >>> you said, the query will be routed to any other node of the shard, >>> regardless of the type), if you have multiple shards then you need to use >>> the “shards” parameter and tell Solr exactly which nodes you want to hit >>> for each shard (the “shards” approach can also be done in the single >>> shard >>> case, although you would be adding an extra hop I believe) >>> >>> Tomás >>> Sent from my iPhone >>> >>> On Feb 11, 2018, at 6:35 PM, Greg Roodt <gro...@gmail.com> wrote: >>>> >>>> Hi >>>> >>>> I have a question around how queries are routed and load-balanced in a >>>> cluster of mixed TLOG and PULL replicas. >>>> >>>> I thought that I might have to put a load-balancer in front of the PULL >>>> replicas and direct queries at them manually as nodes are added and >>>> >>> removed >>> >>>> as PULL replicas. However, it seems that SolrCloud handles this >>>> automatically? >>>> >>>> If I add a new PULL replica node, it goes into state="recovering" while >>>> >>> it >>> >>>> pulls the core. As expected. What happens if queries are directed at >>>> this >>>> node while in this state? From what I am observing, the query gets >>>> >>> directed >>> >>>> to another node? >>>> >>>> If SolrCloud is handling the routing of requests to active nodes, will >>>> it >>>> automatically favour PULL replicas for read queries and TLOG replicas >>>> for >>>> writes? >>>> >>>> Thanks >>>> Greg >>>> >>> >>> >> > -- > Ere Maijala > Kansalliskirjasto / The National Library of Finland >