Thanks Joel, that cleared things up nicely ... using 4 workers against 4 shards resulted in 16 queries to the collection. However, not all replicas were used for all shards, so it's not as balanced as I thought it would be, but we're dealing with small numbers of shards and replicas here.
On Mon, May 23, 2016 at 12:58 PM, Joel Bernstein <joels...@gmail.com> wrote: > Streaming expressions will utilize all replicas of a cluster when the > number of workers >= the number of replicas. > > For example if there are 40 workers and 40 shards and 5 replicas. > > For a single parallel request: > > Each worker will send 1 query to a random replica in each shard. This is > 1600 hundreds requests. The 1600 requests will be spread evenly across all > 200 nodes in the cluster, with each node handling 8 requests. Each request > will return 1/1600 of the result set. > > If you add another row of replicas the 1600 hundred requests will be > handled by 240 nodes. > > ----- > > In streaming expressions you use the parallel function to send requests to > workers. > > In SQL you specify aggregationMode=map_reduce and workers=X. The SQL > interface only goes into parallel mode for GROUP BY and SELECT DISTINCT > queries. > > > > > > > > > > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Mon, May 23, 2016 at 7:17 PM, Joel Bernstein <joels...@gmail.com> wrote: > >> The image is the correct flow. Are you using workers? >> >> >> >> Joel Bernstein >> http://joelsolr.blogspot.com/ >> >> On Mon, May 23, 2016 at 7:16 PM, Timothy Potter <thelabd...@gmail.com> >> wrote: >> >>> This image from the wiki kind of gives that impression to me: >>> >>> >>> https://cwiki.apache.org/confluence/download/attachments/61311194/cluster.png?version=1&modificationDate=1447365789000&api=v2 >>> >>> On Mon, May 23, 2016 at 11:50 AM, Erick Erickson >>> <erickerick...@gmail.com> wrote: >>> > I _think_ this is a distinction between >>> > serving the query and processing the results. The >>> > query is the standard Solr processing returning >>> > results from one replica per shard. >>> > >>> > Those results can be partitioned out to N Solr instances >>> > for sub-processing, where N is however many worker >>> > nodes you specified that may or may not be host >>> > to any replicas of that collection. >>> > >>> > At least I think that's what's up, but then again this is >>> > new to me too. >>> > >>> > Which bits of the doc anyway? Sounds like some >>> > clarification is in order. >>> > >>> > Best, >>> > Erick >>> > >>> > On Mon, May 23, 2016 at 9:32 AM, Timothy Potter <thelabd...@gmail.com> >>> wrote: >>> >> I've seen docs and diagrams that seem to indicate a streaming >>> >> expression can utilize all replicas of a shard but I'm seeing only 1 >>> >> replica per shard (I have 2) being queried. >>> >> >>> >> All replicas are on the same host for my experimentation, could that >>> >> be the issue? What are the circumstances where all replicas will be >>> >> utilized? >>> >> >>> >> Or is this a mis-understanding of the docs? >>> >> >>