Thanks so much again Tomas! You've answered my questions and I clearly understand now. Great work!
On 13 February 2018 at 09:13, Tomas Fernandez Lobbe <tflo...@apple.com> wrote: > > > > On Feb 12, 2018, at 12:06 PM, Greg Roodt <gro...@gmail.com> wrote: > > > > Thanks Ere. I've taken a look at the discussion here: > > http://lucene.472066.n3.nabble.com/Limit-search- > queries-only-to-pull-replicas-td4367323.html > > This is how I was imagining TLOG & PULL replicas would wor, so if this > > functionality does get developed, it would be useful to me. > > > > I still have 2 questions at the moment: > > 1. I am running the single shard scenario. I'm thinking of using a > > dedicated HTTP load-balancer in front of the PULL replicas only with > > read-only queries directed directly at the load-balancer. In this > > situation, the healthy PULL replicas *should* handle the queries on the > > node itself without a proxy hop (assuming state=active). New PULL > replicas > > added to the load-balancer will internally proxy queries to the other > PULL > > or TLOG replicas while in state=recovering until the switch to > > state=active. Is my understanding correct? > > Yes > > > > > 2. Is it all worth it? Is there any advantage to running a cluster of 3 > > TLOGs + 10 PULL replicas vs running 13 TLOG replicas? > > > > I don’t have a definitive answer, this will depend on your specific use > case. As Erick said, there is very little work that non-leader TLOG > replicas do for each update, and having all TLOG replicas means that with a > single active replica you could in theory handle updates. It’s sometimes > nice to separate query traffic from update traffic, but this can still be > done if you have all TLOG replicas and you just make sure you don’t query > the leader… > One nice characteristic that PULL replicas have is that they can’t go into > Leader Initiated Recovery (LIR) state, even if there is some sort of > network partition, they’ll remain in active state even if they can’t talk > with the leader as long as they can reach ZooKeeper (note that this means > they may be responding with outdated data for an undetermined amount of > time, until replicas can replicate from the leader again). Also, since > updates are not sent to all the replicas (only the TLOG replicas), updates > should be faster with 3 TLOG vs 13 TLOG replicas. > > > Tomás > > > > > > > > > On 12 February 2018 at 19:25, Ere Maijala <ere.maij...@helsinki.fi> > wrote: > > > >> Your question about directing queries to PULL replicas only has been > >> discussed on the list. Look for topic "Limit search queries only to pull > >> replicas". What I'd like to see is something similar to the > >> preferLocalShards parameter. It could be something like > >> "preferReplicaTypes=TLOG,PULL". Tomás mentioned previously that > >> SOLR-10880 could be used as a base for such funtionality, and I'm > >> considering taking a stab at implementing it. > >> > >> --Ere > >> > >> > >> Greg Roodt kirjoitti 12.2.2018 klo 6.55: > >> > >>> Thank you both for your very detailed answers. > >>> > >>> This is great to know. I knew that SolrJ had the cluster aware > knowledge > >>> (via zookeeper), but I was wondering what something like curl would do. > >>> Great to know that internally the cluster will proxy queries to the > >>> appropriate place regardless. > >>> > >>> I am running the single shard scenario. I'm thinking of using a > dedicated > >>> HTTP load-balancer in front of the PULL replicas only with read-only > >>> queries directed directly at the load-balancer. In this situation, the > >>> healthy PULL replicas *should* handle the queries on the node itself > >>> without a proxy hop (assuming state=active). New PULL replicas added to > >>> the > >>> load-balancer will internally proxy queries to the other PULL or TLOG > >>> replicas while in state=recovering until the switch to state=active. > >>> > >>> Is my understanding correct? > >>> > >>> Is this sensible to do, or is it not worth it due to the smart proxying > >>> that SolrCloud can do anyway? > >>> > >>> If the TLOG and PULL replicas are so similar, is there any real > advantage > >>> to having a mixed cluster? I assume a bit less work is required across > the > >>> cluster to propagate writes if you only have 3 TLOG nodes vs 10+ PULL > >>> nodes? Or would it be better to just have 13 TLOG nodes? > >>> > >>> > >>> > >>> > >>> > >>> On 12 February 2018 at 15:24, Tomas Fernandez Lobbe <tflo...@apple.com > > > >>> wrote: > >>> > >>> On the last question: > >>>> For Writes: Yes. Writes are going to be sent to the shard leader, and > >>>> since PULL replicas can’t be leaders, it’s going to be a TLOG > replica. > >>>> If > >>>> you are using CloudSolrClient, then this routing will be done directly > >>>> from > >>>> the client (since it will send the update to the leader), and if you > are > >>>> using some other HTTP client, then yes, the PULL replica will forward > the > >>>> update, the same way any non-leader node would. > >>>> > >>>> For reads: this won’t happen today, and any replica can respond to > >>>> queries. I do believe there is value in this kind of routing logic, > >>>> sometimes you simply don’t want the leader to handle any queries, > >>>> specially > >>>> when queries can be expensive. You could do this today if you want, by > >>>> putting some load balancer in front and just direct your queries to > the > >>>> nodes you know are PULL, but keep in mind that this would only work in > >>>> the > >>>> single shard scenario, and only if you hit an active replica > (otherwise, > >>>> as > >>>> you said, the query will be routed to any other node of the shard, > >>>> regardless of the type), if you have multiple shards then you need to > use > >>>> the “shards” parameter and tell Solr exactly which nodes you want to > hit > >>>> for each shard (the “shards” approach can also be done in the single > >>>> shard > >>>> case, although you would be adding an extra hop I believe) > >>>> > >>>> Tomás > >>>> Sent from my iPhone > >>>> > >>>> On Feb 11, 2018, at 6:35 PM, Greg Roodt <gro...@gmail.com> wrote: > >>>>> > >>>>> Hi > >>>>> > >>>>> I have a question around how queries are routed and load-balanced in > a > >>>>> cluster of mixed TLOG and PULL replicas. > >>>>> > >>>>> I thought that I might have to put a load-balancer in front of the > PULL > >>>>> replicas and direct queries at them manually as nodes are added and > >>>>> > >>>> removed > >>>> > >>>>> as PULL replicas. However, it seems that SolrCloud handles this > >>>>> automatically? > >>>>> > >>>>> If I add a new PULL replica node, it goes into state="recovering" > while > >>>>> > >>>> it > >>>> > >>>>> pulls the core. As expected. What happens if queries are directed at > >>>>> this > >>>>> node while in this state? From what I am observing, the query gets > >>>>> > >>>> directed > >>>> > >>>>> to another node? > >>>>> > >>>>> If SolrCloud is handling the routing of requests to active nodes, > will > >>>>> it > >>>>> automatically favour PULL replicas for read queries and TLOG replicas > >>>>> for > >>>>> writes? > >>>>> > >>>>> Thanks > >>>>> Greg > >>>>> > >>>> > >>>> > >>> > >> -- > >> Ere Maijala > >> Kansalliskirjasto / The National Library of Finland > >> > >