Well, ya learn somethin' new every day....
On Mon, May 23, 2016 at 4:31 PM, Timothy Potter <thelabd...@gmail.com> wrote: > Thanks Joel, that cleared things up nicely ... using 4 workers against > 4 shards resulted in 16 queries to the collection. However, not all > replicas were used for all shards, so it's not as balanced as I > thought it would be, but we're dealing with small numbers of shards > and replicas here. > > On Mon, May 23, 2016 at 12:58 PM, Joel Bernstein <joels...@gmail.com> wrote: >> Streaming expressions will utilize all replicas of a cluster when the >> number of workers >= the number of replicas. >> >> For example if there are 40 workers and 40 shards and 5 replicas. >> >> For a single parallel request: >> >> Each worker will send 1 query to a random replica in each shard. This is >> 1600 hundreds requests. The 1600 requests will be spread evenly across all >> 200 nodes in the cluster, with each node handling 8 requests. Each request >> will return 1/1600 of the result set. >> >> If you add another row of replicas the 1600 hundred requests will be >> handled by 240 nodes. >> >> ----- >> >> In streaming expressions you use the parallel function to send requests to >> workers. >> >> In SQL you specify aggregationMode=map_reduce and workers=X. The SQL >> interface only goes into parallel mode for GROUP BY and SELECT DISTINCT >> queries. >> >> >> >> >> >> >> >> >> >> >> >> Joel Bernstein >> http://joelsolr.blogspot.com/ >> >> On Mon, May 23, 2016 at 7:17 PM, Joel Bernstein <joels...@gmail.com> wrote: >> >>> The image is the correct flow. Are you using workers? >>> >>> >>> >>> Joel Bernstein >>> http://joelsolr.blogspot.com/ >>> >>> On Mon, May 23, 2016 at 7:16 PM, Timothy Potter <thelabd...@gmail.com> >>> wrote: >>> >>>> This image from the wiki kind of gives that impression to me: >>>> >>>> >>>> https://cwiki.apache.org/confluence/download/attachments/61311194/cluster.png?version=1&modificationDate=1447365789000&api=v2 >>>> >>>> On Mon, May 23, 2016 at 11:50 AM, Erick Erickson >>>> <erickerick...@gmail.com> wrote: >>>> > I _think_ this is a distinction between >>>> > serving the query and processing the results. The >>>> > query is the standard Solr processing returning >>>> > results from one replica per shard. >>>> > >>>> > Those results can be partitioned out to N Solr instances >>>> > for sub-processing, where N is however many worker >>>> > nodes you specified that may or may not be host >>>> > to any replicas of that collection. >>>> > >>>> > At least I think that's what's up, but then again this is >>>> > new to me too. >>>> > >>>> > Which bits of the doc anyway? Sounds like some >>>> > clarification is in order. >>>> > >>>> > Best, >>>> > Erick >>>> > >>>> > On Mon, May 23, 2016 at 9:32 AM, Timothy Potter <thelabd...@gmail.com> >>>> wrote: >>>> >> I've seen docs and diagrams that seem to indicate a streaming >>>> >> expression can utilize all replicas of a shard but I'm seeing only 1 >>>> >> replica per shard (I have 2) being queried. >>>> >> >>>> >> All replicas are on the same host for my experimentation, could that >>>> >> be the issue? What are the circumstances where all replicas will be >>>> >> utilized? >>>> >> >>>> >> Or is this a mis-understanding of the docs? >>>> >>> >>>