Re: Joins with SolrCloud

Joel Bernstein Fri, 11 Dec 2015 09:45:08 -0800

You can also do the innerJoin in parallel across worker nodes using the
parallel function:


hashJoin(
                parallel(workerCollection,
                            innerJoin(
                                            search(users, q="*:*",
fl="userId, full_name, hometown", sort="userId asc", zkHost="zk2:2345",
qt="/export" partitionKeys="userId"),
                                            search(reviews, q="*:*",
fl="userId, review, score", sort="userId asc", zkHost="zk1:2345",
qt="/export" partitionKeys="userId"),
                                            on="userId"
                                            ),
                             workers="20",
                             zkHost="zk1:2345",
                             sort="userId asc"
                             ),
               hashed=search(restaurants, q="city:nyc",
fl="restaurantId, restaurantName",
sort="restaurantId asc", zkHost="zk1:2345", qt="/export"),
               on="restaurantId"
)

The parallel function will return the tuples from the innerJoin which is
performed on 20 workers in this example. The worker nodes will be selected
from "workerCollection" which can be any SolrCloud collection with enough
nodes. The "partitionKeys" parameter has been added to searches so that
results with the same userId are shuffled to the same worker node.



Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, Dec 11, 2015 at 11:00 AM, Dennis Gove <dpg...@gmail.com> wrote:

> Mugeesh,
>
> You can use Streaming Aggregation to provide various types of
> cross-collection joins. This is currently available in trunk and will be a
> part of Solr 6.
>
> To follow with your example, let's assume the following setup:
> Restaurants: avail on machine1:8983 with 3 shards, zk at zk1:2345
> Users: avail on machine2:8983 with 2 shards, zk at zk2:2345
> Reviews: avail on machine1:8983 with 10 shards, zk at zk1:2345
>
> You could send a streaming query to solr that would return all reviews for
> restaurants in NYC and include the user's hometown
>
> hashJoin(
>   innerJoin(
>     search(users, q="*:*", fl="userId, full_name, hometown", sort="userId
> asc", zkHost="zk2:2345", qt="/export"),
>     search(reviews, q="*:*", fl="userId, review, score", sort="userId asc",
> zkHost="zk1:2345", qt="/export"),
>     on="userId"
>   ),
>   hashed=search(restaurants, q="city:nyc", fl="restaurantId,
> restaurantName", sort="restaurantId asc", zkHost="zk1:2345", qt="/export"),
>   on="restaurantId"
> )
>
> Note that the # of shards doesn't matter and doesn't need to be considered
> as a part of your query. Were you to send this off to a url for result,
> it'd look like this
>
> http://machine1:8983/solr/users/stream?stream=
> <http://localhost:8983/solr/careers/stream?stream=innerJoin(search(careers
> >[the
> expression above]
>
> Additional information about Streaming API, Streaming Aggregation, and
> Streaming Expressions can be found at
> https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions,
> though this is currently incomplete as a lot of the new features have yet
> to be added to the documentation.
>
> For those interested, joins were added under tickets
> https://issues.apache.org/jira/browse/SOLR-7584 and
> https://issues.apache.org/jira/browse/SOLR-8188.
>
> - Dennis
>
>
> On Mon, Dec 7, 2015 at 7:42 AM, Mugeesh Husain <muge...@gmail.com> wrote:
>
> > I have create 3 cores  on same machine using solrlcoud.
> > core: Restaurant,User,Review
> > each of core has only 1 shards and 2 replicas.
> >
> > Question
> > 1.) It is possible to use join among 3 of cores on same machine( or
> > different machine)
> > 2.)I am struggling how to use join among 3 of core in solrlcoud mode.
> >
> > Client: is not interested to de-normalized data.
> >
> > Give some suggestion how to solved that problem.
> >
> > Thanks
> > Mugeesh
> >
> >
> >
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/Joins-with-SolrCloud-tp4243957.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>

Re: Joins with SolrCloud

Reply via email to