Re: Streaming expression not hitting all replicas?

Erick Erickson Mon, 23 May 2016 18:09:07 -0700

Well, ya learn somethin' new every day....


On Mon, May 23, 2016 at 4:31 PM, Timothy Potter <thelabd...@gmail.com> wrote:
> Thanks Joel, that cleared things up nicely ... using 4 workers against
> 4 shards resulted in 16 queries to the collection. However, not all
> replicas were used for all shards, so it's not as balanced as I
> thought it would be, but we're dealing with small numbers of shards
> and replicas here.
>
> On Mon, May 23, 2016 at 12:58 PM, Joel Bernstein <joels...@gmail.com> wrote:
>> Streaming expressions will utilize all replicas of a cluster when the
>> number of workers >= the number of replicas.
>>
>> For example if there are 40 workers and 40 shards and 5 replicas.
>>
>> For a single parallel request:
>>
>> Each worker will send 1 query to a random replica in each shard. This is
>> 1600 hundreds requests. The 1600 requests will be spread evenly across all
>> 200 nodes in the cluster, with each node handling 8 requests. Each request
>> will return 1/1600 of the result set.
>>
>> If you add another row of replicas the 1600 hundred requests will be
>> handled by 240 nodes.
>>
>> -----
>>
>> In streaming expressions you use the parallel function to send requests to
>> workers.
>>
>> In SQL you specify aggregationMode=map_reduce and workers=X. The SQL
>> interface only goes into parallel mode for GROUP BY and SELECT DISTINCT
>> queries.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>> On Mon, May 23, 2016 at 7:17 PM, Joel Bernstein <joels...@gmail.com> wrote:
>>
>>> The image is the correct flow. Are you using workers?
>>>
>>>
>>>
>>> Joel Bernstein
>>> http://joelsolr.blogspot.com/
>>>
>>> On Mon, May 23, 2016 at 7:16 PM, Timothy Potter <thelabd...@gmail.com>
>>> wrote:
>>>
>>>> This image from the wiki kind of gives that impression to me:
>>>>
>>>>
>>>> https://cwiki.apache.org/confluence/download/attachments/61311194/cluster.png?version=1&modificationDate=1447365789000&api=v2
>>>>
>>>> On Mon, May 23, 2016 at 11:50 AM, Erick Erickson
>>>> <erickerick...@gmail.com> wrote:
>>>> > I _think_ this is a distinction between
>>>> > serving the query and processing the results. The
>>>> > query is the standard Solr processing returning
>>>> > results from one replica per shard.
>>>> >
>>>> > Those results can be partitioned out to N Solr instances
>>>> > for sub-processing, where N is  however many worker
>>>> > nodes you specified that may or may not be host
>>>> > to any replicas of that collection.
>>>> >
>>>> > At least I think that's what's up, but then again this is
>>>> > new to me too.
>>>> >
>>>> > Which bits of the doc anyway? Sounds like some
>>>> > clarification is in order.
>>>> >
>>>> > Best,
>>>> > Erick
>>>> >
>>>> > On Mon, May 23, 2016 at 9:32 AM, Timothy Potter <thelabd...@gmail.com>
>>>> wrote:
>>>> >> I've seen docs and diagrams that seem to indicate a streaming
>>>> >> expression can utilize all replicas of a shard but I'm seeing only 1
>>>> >> replica per shard (I have 2) being queried.
>>>> >>
>>>> >> All replicas are on the same host for my experimentation, could that
>>>> >> be the issue? What are the circumstances where all replicas will be
>>>> >> utilized?
>>>> >>
>>>> >> Or is this a mis-understanding of the docs?
>>>>
>>>
>>>

Re: Streaming expression not hitting all replicas?

Reply via email to