Re: Streaming expression not hitting all replicas?

Timothy Potter Mon, 23 May 2016 16:32:47 -0700

Thanks Joel, that cleared things up nicely ... using 4 workers against
4 shards resulted in 16 queries to the collection. However, not all
replicas were used for all shards, so it's not as balanced as I
thought it would be, but we're dealing with small numbers of shards
and replicas here.


On Mon, May 23, 2016 at 12:58 PM, Joel Bernstein <[email protected]> wrote:
> Streaming expressions will utilize all replicas of a cluster when the
> number of workers >= the number of replicas.
>
> For example if there are 40 workers and 40 shards and 5 replicas.
>
> For a single parallel request:
>
> Each worker will send 1 query to a random replica in each shard. This is
> 1600 hundreds requests. The 1600 requests will be spread evenly across all
> 200 nodes in the cluster, with each node handling 8 requests. Each request
> will return 1/1600 of the result set.
>
> If you add another row of replicas the 1600 hundred requests will be
> handled by 240 nodes.
>
> -----
>
> In streaming expressions you use the parallel function to send requests to
> workers.
>
> In SQL you specify aggregationMode=map_reduce and workers=X. The SQL
> interface only goes into parallel mode for GROUP BY and SELECT DISTINCT
> queries.
>
>
>
>
>
>
>
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Mon, May 23, 2016 at 7:17 PM, Joel Bernstein <[email protected]> wrote:
>
>> The image is the correct flow. Are you using workers?
>>
>>
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>> On Mon, May 23, 2016 at 7:16 PM, Timothy Potter <[email protected]>
>> wrote:
>>
>>> This image from the wiki kind of gives that impression to me:
>>>
>>>
>>> https://cwiki.apache.org/confluence/download/attachments/61311194/cluster.png?version=1&modificationDate=1447365789000&api=v2
>>>
>>> On Mon, May 23, 2016 at 11:50 AM, Erick Erickson
>>> <[email protected]> wrote:
>>> > I _think_ this is a distinction between
>>> > serving the query and processing the results. The
>>> > query is the standard Solr processing returning
>>> > results from one replica per shard.
>>> >
>>> > Those results can be partitioned out to N Solr instances
>>> > for sub-processing, where N is  however many worker
>>> > nodes you specified that may or may not be host
>>> > to any replicas of that collection.
>>> >
>>> > At least I think that's what's up, but then again this is
>>> > new to me too.
>>> >
>>> > Which bits of the doc anyway? Sounds like some
>>> > clarification is in order.
>>> >
>>> > Best,
>>> > Erick
>>> >
>>> > On Mon, May 23, 2016 at 9:32 AM, Timothy Potter <[email protected]>
>>> wrote:
>>> >> I've seen docs and diagrams that seem to indicate a streaming
>>> >> expression can utilize all replicas of a shard but I'm seeing only 1
>>> >> replica per shard (I have 2) being queried.
>>> >>
>>> >> All replicas are on the same host for my experimentation, could that
>>> >> be the issue? What are the circumstances where all replicas will be
>>> >> utilized?
>>> >>
>>> >> Or is this a mis-understanding of the docs?
>>>
>>
>>

Re: Streaming expression not hitting all replicas?

Reply via email to