Re: streaming expressions parallel merge

Joel Bernstein Fri, 14 Apr 2017 06:08:10 -0700

Yes, you can wrap the merge in a parallel expression. You'll need to
specify the partitionKeys, which will be used to route documents to worker
nodes. For operations like rollup and joins the partition keys ensure that
the tuples with the same partition keys end up on the same worker. With the
merge function there may not be a need to group records together so you can
specify the _version_ number or id. The merge and parallel functions will
maintain the sort of the underlying streams.

It's not clear though that you'll get a large performance increase
parallelizing a merge because all of the tuples will eventually flow back
through a single node. Where parallel provides a major advantage is when
you can reduce the number of Tuples at worker nodes and send a smaller set
back to the parallel aggregation node. But it's worth trying to see if you
do get performance increases parallelizing a merge.

Here are the docs on the parallel function:
https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions#StreamingExpressions-parallel

Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Apr 13, 2017 at 3:06 AM, Damien Kamerman <dami...@gmail.com> wrote:

> Hi,
>
> With solr streaming expressions is there a way to parallel merge a number
> of solr streams. Or a way to apply the parallel function to something like
> this?
>
> merge(
>    search(collection1, ...),
>    search(collection2, ...),
>     ...
>    on="id asc")
> )
>
> Cheers,
> Damien.
>

Re: streaming expressions parallel merge

Reply via email to